Planet Bozo

September 25, 2020

Worse Than FailureError'd: Where to go, Next?

"In this screenshot, 'Lyckades' means 'Succeeded' and the buttons say 'Try again' and 'Cancel'. There is no 'Next' button," wrote Martin W.

 

"I have been meaning to send a card, but I just wasn't planning on using PHP's eval() to send it," Andrew wrote.

 

Martyn R. writes, "I was trying to connect to PA VPN and it seems they think that downgrading my client software will help."

 

"What the heck, Doordash? I was expecting a little more variety from you guys...but, to be honest, I gotta wonder what 'null' flavored cheesecake is like," Joshua M. writes.

 

Nicolas wrote, "NaN exclusive posts? I'm already on the inside loop. After all, I love my grandma!"

 

[Advertisement] Continuously monitor your servers for configuration changes, and report when there's configuration drift. Get started with Otter today!

September 24, 2020

Worse Than FailureCodeSOD: A Generic Comment

To my mind, code comments are important to explain why the code what it does, not so much what it does. Ideally, the what is clear enough from the code that you don’t have to. Today, we have no code, but we have some comments.

Chris recently was reviewing some C# code from 2016, and found a little conversation in the comments, which may or may not explain whats or whys. Line numbers included for, ahem context.

4550: //A bit funky, but entirely understandable: Something that is a C# generic on the storage side gets
4551: //represented on the client side as an array. Why? A C# generic is rather specific, i.e., Java
4552: //doesn't have, for example, a Generic List class. So we're messing with arrays. No biggie.

Now, honestly, I’m more confused than I probably would have been just from the code. Presumably as we’re sending things to a client, we’re going to serialize it to an intermediate representation, so like, sure, arrays. The comment probably tells me why, but it’s hardly a necessary explanation here. And what does Java have to do with anything? And also, Java absolutely does support generics, so even if the Java trivia were relevant, it’s not accurate.

I’m not the only one who had some… questions. The comment continues:

4553: //
4554: //WTF does that have to do with anything? Who wrote this inane, stupid comment, 
4555: //but decided not to put comments on anything useful?

Not to sound judgmental, but if you’re having flamewars in your code comments, you may not be on a healthy, well-functioning team.

Then again, if this is someplace in the middle of your file, and you’re on line 4550, you probably have some other problems going on.

[Advertisement] Otter - Provision your servers automatically without ever needing to log-in to a command prompt. Get started today!

September 23, 2020

Worse Than FailureCodeSOD: A Random While

A bit ago, Aurelia shared with us a backwards for loop. Code which wasn’t wrong, but was just… weird. Well, now we’ve got some code which is just plain wrong, in a number of ways.

The goal of the following Java code is to generate some number of random numbers between 1 and 9, and pass them off to a space-separated file.

StringBuffer buffer = new StringBuffer();
long count = 0;
long numResults = GetNumResults();

while (count < numResults)
{
	ArrayList<BigDecimal> numbers = new ArrayList<BigDecimal>();
	while (numbers.size() < 1)
	{
		int randInt = random.nextInt(10);
		long randLong = randInt & 0xffffffffL;
		if (!numbers.contains(new BigDecimal(randLong)) && (randLong != 0))
		{
			buffer.append(randLong);
			buffer.append(" ");
			numbers.add(new BigDecimal(randLong));
		}
		System.out.println("Random Integer: " + randInt + ", Long Integer: " + randLong);	
	}
	
	outFile.writeLine(buffer.toString()); 
	buffer = new StringBuffer();
	
	count++;
}

Pretty quickly, we get a sense that something is up, with the while (count < numResults)- this begs to be a for loop. It’s not wrong to while this, but it’s suspicious.

Then, right away, we create an ArrayList<BigDecimal>. There is no reasonable purpose to using a BigDecimal to hold a value between 1 and 9. But the rails don’t really start to come off until we get into the inner loop.

while (numbers.size() < 1)
	{
		int randInt = random.nextInt(10);
		long randLong = randInt & 0xffffffffL;
    if (!numbers.contains(new BigDecimal(randLong)) && (randLong != 0))
    …

This loop condition guarantees that we’ll only ever have one element in the list, which means our numbers.contains check doesn’t mean much, does it?

But honestly, that doesn’t hold a candle to the promotion of randInt to randLong, complete with an & 0xffffffffL, which guarantees… well, nothing. It’s completely unnecessary here. We might do that sort of thing when we’re bitshifting and need to mask out for certain bytes, but here it does nothing.

Also note the (randLong != 0) check. Because they use random.nextInt(10), that generates a number in the range 0–9, but we want 1 through 9, so if we draw a zero, we need to re-roll. A simple, and common solution to this would be to do random.nextInt(9) + 1, but at least we now understand the purpose of the while (numbers.size() < 1) loop- we keep trying until we get a non-zero value.

And honestly, I should probably point out that they include a println to make sure that both the int and the long versions match, but how could they not?

Nothing here is necessary. None of this code has to be this way. You don’t need the StringBuffer. You don’t need nested while loops. You don’t need the ArrayList<BigDecimal>, you don’t need the conversion between integer types. You don’t need the debugging println.

[Advertisement] ProGet’s got you covered with security and access controls on your NuGet feeds. Learn more.

XKCDMessage Boards

etbeQemu (KVM) and 9P (Virtfs) Mounts

I’ve tried setting up the Qemu (in this case KVM as it uses the Qemu code in question) 9P/Virtfs filesystem for sharing files to a VM. Here is the Qemu documentation for it [1].

VIRTFS="-virtfs local,path=/vmstore/virtfs,security_model=mapped-xattr,id=zz,writeout=immediate,fmode=0600,dmode=0700,mount_tag=zz"
VIRTFS="-virtfs local,path=/vmstore/virtfs,security_model=passthrough,id=zz,writeout=immediate,mount_tag=zz"

Above are the 2 configuration snippets I tried on the server side. The first uses mapped xattrs (which means that all files will have the same UID/GID and on the host XATTRs will be used for storing the Unix permissions) and the second uses passthrough which requires KVM to run as root and gives the same permissions on the host as on the VM. The advantages of passthrough are better performance through writing less metadata and having the same permissions in host and VM. The advantages of mapped XATTRs are running KVM/Qemu as non-root and not having a SUID file in the VM imply a SUID file in the host.

Here is the link to Bonnie++ output comparing Ext3 on a KVM block device (stored on a regular file in a BTRFS RAID-1 filesystem on 2 SSDs on the host), a NFS share from the host from the same BTRFS filesystem, and virtfs shares of the same filesystem. The only tests that Ext3 doesn’t win are some of the latency tests, latency is based on the worst-case not the average. I expected Ext3 to win most tests, but didn’t expect it to lose any latency tests.

Here is a link to Bonnie++ output comparing just NFS and Virtfs. It’s obvious that Virtfs compares poorly, giving about half the performance on many tests. Surprisingly the only tests where Virtfs compared well to NFS were the file creation tests which I expected Virtfs with mapped XATTRs to do poorly due to the extra metadata.

Here is a link to Bonnie++ output comparing only Virtfs. The options are mapped XATTRs with default msize, mapped XATTRs with 512k msize (I don’t know if this made a difference, the results are within the range of random differences), and passthrough. There’s an obvious performance benefit in passthrough for the small file tests due to the less metadata overhead, but as creating small files isn’t a bottleneck on most systems a 20% to 30% improvement in that area probably doesn’t matter much. The result from the random seeks test in passthrough is unusual, I’ll have to do more testing on that.

SE Linux

On Virtfs the XATTR used for SE Linux labels is passed through to the host. So every label used in a VM has to be valid on the host and accessible to the context of the KVM/Qemu process. That’s not really an option so you have to use the context mount option. Having the mapped XATTR mode work for SE Linux labels is a necessary feature.

Conclusion

The msize mount option in the VM doesn’t appear to do anything and it doesn’t appear in /proc/mounts, I don’t know if it’s even supported in the kernel I’m using.

The passthrough and mapped XATTR modes give near enough performance that there doesn’t seem to be a benefit of one over the other.

NFS gives significant performance benefits over Virtfs while also using less CPU time in the VM. It has the issue of files named .nfs* hanging around if the VM crashes while programs were using deleted files. It’s also more well known, ask for help with an NFS problem and you are more likely to get advice than when asking for help with a virtfs problem.

Virtfs might be a better option for accessing databases than NFS due to it’s internal operation probably being a better map to Unix filesystem semantics, but running database servers on the host is probably a better choice anyway.

Virtfs generally doesn’t seem to be worth using. I had hoped for performance that was better than NFS but the only benefit I seemed to get was avoiding the .nfs* file issue.

The best options for storage for a KVM/Qemu VM seem to be Ext3 for files that are only used on one VM and for which the size won’t change suddenly or unexpectedly (particularly the root filesystem) and NFS for everything else.

September 22, 2020

Worse Than FailureCodeSOD: A Cutt Above

We just discussed ViewState last week, and that may have inspired Russell F to share with us this little snippet.

private ConcurrentQueue<AppointmentCuttOff> lstAppointmentCuttOff { get { object o = ViewState["lstAppointmentCuttOff"]; if (o == null) return null; else return (ConcurrentQueue<AppointmentCuttOff>)o; } set { ViewState["lstAppointmentCuttOff"] = value; } }

This pattern is used for pretty much all of the ViewState data that this code interacts with, and if you look at the null check, you can see that it's unnecessary. Our code checks for a null, and if we have one… returns null. The entire get block could just be: return (ConcurrentQueue<AppointmentCuttOff>)ViewState["lstAppointmentCuttOff"]

The bigger glitch here is the data-type. While there are a queue of appointments, that queue is never accessed across threads, so there's no need for a threadsafe ConcurrentQueue.

But I really love the name of the variable we store in ViewState. We have Hungarian notation, which calls it a lst, which isn't technically correct, though it is iterable, so maybe that's what they meant, but if the point of Hungarian notation is to make the code more clear, this isn't helping.

But what I really love is that these are CuttOffs, which just sounds like some retail brand attempting to sell uncomfortably short denim. It'll be next year's summer trend, mark my words!

[Advertisement] Otter - Provision your servers automatically without ever needing to log-in to a command prompt. Get started today!

September 21, 2020

XKCDVolcano Dinosaur

September 19, 2020

etbeBurning Lithium Ion Batteries

I had an old Nexus 4 phone that was expanding and decided to test some of the theories about battery combustion.

The first claim that often gets made is that if the plastic seal on the outside of the battery is broken then the battery will catch fire. I tested this by cutting the battery with a craft knife. With every cut the battery sparked a bit and then when I levered up layers of the battery (it seems to be multiple flat layers of copper and black stuff inside the battery) there were more sparks. The battery warmed up, it’s plausible that in a confined environment that could get hot enough to set something on fire. But when the battery was resting on a brick in my backyard that wasn’t going to happen.

The next claim is that a Li-Ion battery fire will be increased with water. The first thing to note is that Li-Ion batteries don’t contain Lithium metal (the Lithium high power non-rechargeable batteries do). Lithium metal will seriously go off it exposed to water. But lots of other Lithium compounds will also react vigorously with water (like Lithium oxide for example). After cutting through most of the center of the battery I dripped some water in it. The water boiled vigorously and the corners of the battery (which were furthest away from the area I cut) felt warmer than they did before adding water. It seems that significant amounts of energy are released when water reacts with whatever is inside the Li-Ion battery. The reaction was probably giving off hydrogen gas but didn’t appear to generate enough heat to ignite hydrogen (which is when things would really get exciting). Presumably if a battery was cut in the presence of water while in an enclosed space that traps hydrogen then the sparks generated by the battery reacting with air could ignite hydrogen generated from the water and give an exciting result.

It seems that a CO2 fire extinguisher would be best for a phone/tablet/laptop fire as that removes oxygen and cools it down. If that isn’t available then a significant quantity of water will do the job, water won’t stop the reaction (it can prolong it), but it can keep the reaction to below 100C which means it won’t burn a hole in the floor and the range of toxic chemicals released will be reduced.

The rumour that a phone fire on a plane could do a “China syndrome” type thing and melt through the Aluminium body of the plane seems utterly bogus. I gave it a good try and was unable to get a battery to burn through it’s plastic and metal foil case. A spare battery for a laptop in checked luggage could be a major problem for a plane if it ignited. But a battery in the passenger area seems unlikely to be a big problem if plenty of water is dumped on it to prevent the plastic case from burning and polluting the air.

I was not able to get a result that was even worthy of a photograph. I may do further tests with laptop batteries.

September 18, 2020

XKCDVoting

September 17, 2020

etbeDell BIOS Updates

I have just updated the BIOS on a Dell PowerEdge T110 II. The process isn’t too difficult, Google for the machine name and BIOS, download a shell script encoded firmware image and GPG signature, then run the script on the system in question.

One problem is that the Dell GPG key isn’t signed by anyone. How hard would it be to get a few well connected people in the Linux community to sign the key used for signing Linux scripts for updating the BIOS? I would be surprised if Dell doesn’t employ a few people who are well connected in the Linux community, they should just ask all employees to sign such GPG keys! Failing that there are plenty of other options. I’d be happy to sign the Dell key if contacted by someone who can prove that they are a responsible person in Dell. If I could phone Dell corporate and ask for the engineering department and then have someone tell me the GPG fingerprint I’ll sign the key and that problem will be partially solved (my key is well connected but you need more than one signature).

The next issue is how to determine that a BIOS update works. What you really don’t want is to have a BIOS update fail and brick your system! So the Linux update process loads the image into something (special firmware RAM maybe) and then reboots the system and the reboot then does a critical part of the update. If the reboot doesn’t work then you end up with the old version of the BIOS. This is overall a good thing.

The PowerEdge T110 II is a workstation with an NVidia video card (I tried an ATI card but that wouldn’t boot for unknown reasons). The Nouveau driver has some issues. One thing I have done to work around some Nouveau issues is to create a file “~/.config/plasma-workspace/env/nouveau-broken.sh” (for KDE sessions) with the following contents:

export LIBGL_ALWAYS_SOFTWARE=1

I previously wrote about using this just for Kmail to stop it crashing [1]. But after doing that I still had other problems with video and disabling all GL on the NVidia card was necessary.

The latest problem I’ve had is that even when using that configuration things don’t go well. When I run the “reboot” command I end up with a kernel message about the GPU not responding and then it doesn’t reboot. That means that the BIOS update doesn’t apply, a hard reboot signals to the system that the new BIOS wasn’t good and I end up with the old BIOS again. I discovered that disabling sddm (the latest xdm program in Debian) from starting on boot meant that a reboot command would work. Then I ran the BIOS update script and it’s reboot command worked and gave a successful BIOS update.

So I’ve gone from a 2013 BIOS to a 2018 BIOS! The update list says that some CVEs have been addressed, but the spectre-meltdown-checker doesn’t report any fewer vulnerabilities.

September 16, 2020

XKCDCommon Star Types

September 15, 2020

etbeMore About the PowerEdge T710

I’ve got the T710 (mentioned in my previous post [1]) online. When testing the T710 at home I noticed that sometimes the VGA monitor I was using would start flickering when in some parts of the BIOS setup, it seemed that the horizonal sync wasn’t working properly. It didn’t seem to be a big deal at the time. When I deployed it the KVM display that I had planned to use with it mostly didn’t display anything. When the display was working the KVM keyboard wouldn’t work (and would prevent a regular USB keyboard from working if they were both connected at the same time). The VGA output of the T710 also wouldn’t work with my VGA->HDMI device so I couldn’t get it working with my portable monitor.

Fortunately the Dell front panel has a display and tiny buttons that allow configuring the IDRAC IP address, so I was able to get IDRAC going. One thing Dell really should do is allow the down button to change 0 to 9 when entering numbers, that would make it easier to enter 8.8.8.8 for the DNS server. Another thing Dell should do is make the default gateway have a default value according to the IP address and netmask of the server.

When I got IDRAC going it was easy to setup a serial console, boot from a rescue USB device, create a new initrd with the driver for the MegaRAID controller, and then reboot into the server image.

When I transferred the SSDs from the old server to the newer Dell server the problem I had was that the Dell drive caddies had no holes in suitable places for attaching SSDs. I ended up just pushing the SSDs in so they are hanging in mid air attached only by the SATA/SAS connectors. Plugging them in took the space from the above drive, so instead of having 2*3.5″ disks I have 1*2.5″ SSD and need the extra space to get my hand in. The T710 is designed for 6*3.5″ disks and I’m going to have trouble if I ever want to have more than 3*2.5″ SSDs. Fortunately I don’t think I’ll need more SSDs.

After booting the system I started getting alerts about a “fault” in one SSD, with no detail on what the fault might be. My guess is that the SSD in question is M.2 and it’s in a M.2 to regular SATA adaptor which might have some problems. The data seems fine though, a BTRFS scrub found no checksum errors. I guess I’ll have to buy a replacement SSD soon.

I configured the system to use the “nosmt” kernel command line option to disable hyper-threading (which won’t provide much performance benefit but which makes certain types of security attacks much easier). I’ve configured BOINC to run on 6/8 CPU cores and surprisingly that didn’t cause the fans to be louder than when the system was idle. It seems that a system that is designed for 6 SAS disks doesn’t need a lot of cooling when run with SSDs.

July 17, 2020

Dave HallIf You’re not Using YAML for CloudFormation Templates, You’re Doing it Wrong

In my last blog post, I promised a rant about using YAML for CloudFormation templates. Here it is. If you persevere to the end I’ll also show you have to convert your existing JSON based templates to YAML.

Many of the points I raise below don’t just apply to CloudFormation. They are general comments about why you should use YAML over JSON for configuration when you have a choice.

One criticism of YAML is its reliance on indentation. A lot of the code I write these days is Python, so indentation being significant is normal. Use a decent editor or IDE and this isn’t a problem. It doesn’t matter if you’re using JSON or YAML, you will want to validate and lint your files anyway. How else will you find that trailing comma in your JSON object?

Now we’ve got that out of the way, let me try to convince you to use YAML.

As developers we are regularly told that we need to document our code. CloudFormation is Infrastructure as Code. If it is code, then we need to document it. That starts with the Description property at the top of the file. If you JSON for your templates, that’s it, you have no other opportunity to document your templates. On the other hand, if you use YAML you can add inline comments. Anywhere you need a comment, drop in a hash # and your comment. Your team mates will thank you.

JSON templates don’t support multiline strings. These days many developers have 4K or ultra wide monitors, we don’t want a string that spans the full width of our 34” screen. Text becomes harder to read once you exceed that “90ish” character limit. With JSON your multiline string becomes "[90ish-characters]\n[another-90ish-characters]\n[and-so-on"]. If you opt for YAML, you can use the greater than symbol (>) and then start your multiline comment like so:

Description: >
  This is the first line of my Description
  and it continues on my second line
  and I'll finish it on my third line.

As you can see it much easier to work with multiline string in YAML than JSON.

“Folded blocks” like the one above are created using the > replace new lines with spaces. This allows you to format your text in a more readable format, but allow a machine to use it as intended. If you want to preserve the new line, use the pipe (|) to create a “literal block”. This is great for an inline Lambda functions where the code remains readable and maintainable.

  APIFunction:
    Type: AWS::Lambda::Function
    Properties:
      Code:
        ZipFile: |
          import json
          import random


          def lambda_handler(event, context):
              return {"statusCode": 200, "body": json.dumps({"value": random.random()})}
      FunctionName: "GetRandom"
      Handler: "index.lambda_handler"
      MemorySize: 128
      Role: !GetAtt LambdaServiceRole.Arn
      Runtime: "python3.7"
		Timeout: 5

Both JSON and YAML require you to escape multibyte characters. That’s less of an issue with CloudFormation templates as generally you’re only using the ASCII character set.

In a YAML file generally you don’t need to quote your strings, but in JSON double quotes are used every where, keys, string values and so on. If your string contains a quote you need to escape it. The same goes for tabs, new lines, backslashes and and so on. JSON based CloudFormation templates can be hard to read because of all the escaping. It also makes it harder to handcraft your JSON when your code is a long escaped string on a single line.

Some configuration in CloudFormation can only be expressed as JSON. Step Functions and some of the AppSync objects in CloudFormation only allow inline JSON configuration. You can still use a YAML template and it is easier if you do when working with these objects.

The JSON only configuration needs to be inlined in your template. If you’re using JSON you have to supply this as an escaped string, rather than nested objects. If you’re using YAML you can inline it as a literal block. Both YAML and JSON templates support functions such as Sub being applied to these strings, it is so much more readable with YAML. See this Step Function example lifted from the AWS documentation:

MyStateMachine:
  Type: "AWS::StepFunctions::StateMachine"
  Properties:
    DefinitionString:
      !Sub |
        {
          "Comment": "A simple AWS Step Functions state machine that automates a call center support session.",
          "StartAt": "Open Case",
          "States": {
            "Open Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:open_case",
              "Next": "Assign Case"
            }, 
            "Assign Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:assign_case",
              "Next": "Work on Case"
            },
            "Work on Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:work_on_case",
              "Next": "Is Case Resolved"
            },
            "Is Case Resolved": {
                "Type" : "Choice",
                "Choices": [ 
                  {
                    "Variable": "$.Status",
                    "NumericEquals": 1,
                    "Next": "Close Case"
                  },
                  {
                    "Variable": "$.Status",
                    "NumericEquals": 0,
                    "Next": "Escalate Case"
                  }
              ]
            },
             "Close Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:close_case",
              "End": true
            },
            "Escalate Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:escalate_case",
              "Next": "Fail"
            },
            "Fail": {
              "Type": "Fail",
              "Cause": "Engage Tier 2 Support."    }   
          }
        }

If you’re feeling lazy you can use inline JSON for IAM policies that you’ve copied from elsewhere. It’s quicker than converting them to YAML.

YAML templates are smaller and more compact than the same configuration stored in a JSON based template. Smaller yet more readable is winning all round in my book.

If you’re still not convinced that you should use YAML for your CloudFormation templates, go read Amazon’s blog post from 2017 advocating the use of YAML based templates.

Amazon makes it easy to convert your existing templates from JSON to YAML. cfn-flip is aPython based AWS Labs tool for converting CloudFormation templates between JSON and YAML. I will assume you’ve already installed cfn-flip. Once you’ve done that, converting your templates with some automated cleanups is just a command away:

cfn-flip --clean template.json template.yaml

git rm the old json file, git add the new one and git commit and git push your changes. Now you’re all set for your new life using YAML based CloudFormation templates.

If you want to learn more about YAML files in general, I recommend you check our Learn X in Y Minutes’ Guide to YAML. If you want to learn more about YAML based CloudFormation templates, check Amazon’s Guide to CloudFormation Templates.

July 09, 2020

Dave HallLogging Step Functions to CloudWatch

Many AWS Services log to CloudWatch. Some do it out of the box, others need to be configured to log properly. When Amazon released Step Functions, they didn’t include support for logging to CloudWatch. In February 2020, Amazon announced StepFunctions could now log to CloudWatch. Step Functions still support CloudTrail logs, but CloudWatch logging is more useful for many teams.

Users need to configure Step Functions to log to CloudWatch. This is done on a per State Machine basis. Of course you could click around he console to enable it, but that doesn’t scale. If you use CloudFormation to manage your Step Functions, it is only a few extra lines of configuration to add the logging support.

In my example I will assume you are using YAML for your CloudFormation templates. I’ll save my “if you’re using JSON for CloudFormation you’re doing it wrong” rant for another day. This is a cut down example from one of my services:

---
AWSTemplateFormatVersion: '2010-09-09'
Description: StepFunction with Logging Example.
Parameters:
Resources:
  StepFunctionExecRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service: !Sub "states.${AWS::Region}.amazonaws.com"
          Action:
          - sts:AssumeRole
      Path: "/"
      Policies:
      - PolicyName: StepFunctionExecRole
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Effect: Allow
            Action:
            - lambda:InvokeFunction
            - lambda:ListFunctions
            Resource: !Sub "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:my-lambdas-namespace-*"
          - Effect: Allow
            Action:
            - logs:CreateLogDelivery
            - logs:GetLogDelivery
            - logs:UpdateLogDelivery
            - logs:DeleteLogDelivery
            - logs:ListLogDeliveries
            - logs:PutResourcePolicy
            - logs:DescribeResourcePolicies
            - logs:DescribeLogGroups
            Resource: "*"
  MyStateMachineLogGroup:
    Type: AWS::Logs::LogGroup
    Properties:
      LogGroupName: /aws/stepfunction/my-step-function
      RetentionInDays: 14
  DashboardImportStateMachine:
    Type: AWS::StepFunctions::StateMachine
    Properties:
      StateMachineName: my-step-function
      StateMachineType: STANDARD
      LoggingConfiguration:
        Destinations:
          - CloudWatchLogsLogGroup:
             LogGroupArn: !GetAtt MyStateMachineLogGroup.Arn
        IncludeExecutionData: True
        Level: ALL
      DefinitionString:
        !Sub |
        {
          ... JSON Step Function definition goes here
        }
      RoleArn: !GetAtt StepFunctionExecRole.Arn

The key pieces in this example are the second statement in the IAM Role with all the logging permissions, the LogGroup defined by MyStateMachineLogGroup and the LoggingConfiguration section of the Step Function definition.

The IAM role permissions are copied from the example policy in the AWS documentation for using CloudWatch Logging with Step Functions. The CloudWatch IAM permissions model is pretty weak, so we need to grant these broad permissions.

The LogGroup definition creates the log group in CloudWatch. You can use what ever value you want for the LogGroupName. I followed the Amazon convention of prefixing everything with /aws/[service-name]/ and then appended the Step Function name. I recommend using the RetentionInDays configuration. It stops old logs sticking around for ever. In my case I send all my logs to ELK, so I don’t need to retain them in CloudWatch long term.

Finally we use the LoggingConfiguration to tell AWS where we want to send out logs. You can only specify a single Destinations. The IncludeExecutionData determines if the inputs and outputs of each function call is logged. You should not enable this if you are passing sensitive information between your steps. The verbosity of logging is controlled by Level. Amazon has a page on Step Function log levels. For dev you probably want to use ALL to help with debugging but in production you probably only need ERROR level logging.

I removed the Parameters and Output from the template. Use them as you need to.

April 01, 2020

Dave HallZoom's Make or Break Moment

Zoom is experiencing massive growth as large sections of the workforce transition to working from home. At the same time many problems with Zoom are coming to light. This is their make or break moment. If they fix the problems they end up with a killer video conferencing app. The alternative is that they join Cisco's Webex in the dumpster fire of awful enterprise software.

In the interest of transparency I am a paying Zoom customer and I use it for hours every day. I also use Webex (under protest) as it is a client's video conferencing platform of choice.

In the middle of last year Jonathan Leitschuh disclosed two bugs in zoom with security and privacy implications . There was a string of failures that lead to these bugs. To Zoom’s credit they published a long blog post about why these “features” were there in the first place.

Over the last couple of weeks other issues with Zoom have surfaced. “Zoom bombing” or using random 9 digit numbers to find meetings has become a thing. This is caused by zoom’s meeting rooms having a 9 digit code to join. That’s really handy when you have to dial in and enter the number on your telephone keypad. The down side is that you have a 1 in 999 999 999 chance of joining a meeting when using a random number. Zoom does offer the option of requiring a password or PIN for each call. Unfortunately it isn’t the default. Publishing a blog post on how to secure your meetings isn’t enough, the app needs to be more secure by default. The app should default to enabling a 6 digit PIN when creating a meeting.

The Intercept is reporting Zoom’s marketing department got a little carried away when describing the encryption used in the product. This is an area where words matter. Encryption in transit is a base line requirement in communication tools these days. Zoom has this, but their claims about end to end encryption appear to be false. End to end encryption is very important for some use cases. I await the blog post explaining this one.

I don’t know why Proton Mail’s privacy issues blog post got so much attention. This appears to be based on someone skimming the documentation rather than any real testing. Regardless the post got a lot of traction. Some of the same issues were flagged by the EFF.

Until recently zoom’s FAQ read “Does Zoom sell Personal Data? […] Depends what you mean by ‘sell’”. I’m sure that sounded great in a meeting but it is worrying when you read it as a customer. Once called out on social media it was quickly updated and a blog post published. In the post, Zoom assures users it isn’t selling their data.

Joseph Cox reported late last week that Zoom was sending data to Facebook every time someone used their iOS app. It is unclear if Joe gave Zoom an opportunity to fix the issue before publishing the article. The company pushed out a fix after the story broke.

The most recent issue broke yesterday about the Zoom macOS installer behaving like malware. This seems pretty shady behaviour, like their automatic reinstaller that was fixed last year. To his credit, Zoom Founder and CEO, Eric Yuan engaged with the issue on twitter. This will be one to watch over the coming days.

Over the last year I have seen a consistent pattern when Zoom is called out on security and valid privacy issues with their platform. They respond publicly with “oops my bad” blog posts . Many of the issues appear to be a result of them trying to deliver a great user experience. Unfortunately they some times lean too far toward the UX and ignore the security and privacy implications of their choices. I hope that over the coming months we see Zoom correct this balance as problems are called out. If they do they will end up with an amazing platform in terms of UX while keeping their users safe.

Update Since publishing this post additional issues with Zoom were reported. Zoom's CEO announced the company was committed to fixing their product.

November 16, 2019

Dave HallDrupalSouth Diversity Scholarship Winner Announced

A few weeks ago we announced our diversity scholarship for DrupalSouth. Before announcing the winner I want to talk a bit about our experience doing this for the first time.

DrupalSouth is the largest Drupal event held in Oceania every year. It provides a great marketing opportunity for businesses wanting to promote their products and services to the Drupal community. Dave Hall Consulting planned to sponsor DrupalSouth to promote our new training business - Getting It Live training. By the time we got organised all of the (affordable) sponsorship opportunities had gone. After considering various opportunities around the event we felt the best way of investing a similar amount of money and giving something back to the community was through a diversity scholarship

The community provided positive feedback about the initiative. However despite the enthusiasm and working our networks to get a range of applicants, we only ended up with 7 applicants. They were all guys. One applicant was from Australia, the rest were from overseas. About half the applicants dropped out when contacted to confirm that they could cover their own travel and visa expenses.

We are likely to offer other scholarships in the future. We will start earlier and explore other channels for promoting the program.

The scholarship has been awarded to Yogesh Ingale, from Mumbai, India. Over the last 3 years Yogesh has been employed by Tata Consultancy Services’ digital operations team as a DevOps Engineer. During this time he has worked with Drupal, Cloud Computing, Python and Web Technologies. Yogesh is interested in automating processes. When he’s not working, Yogesh likes to travel, automate things and write blog posts. Disclaimer: I know Yogesh through my work with one of my clients. Some times the Drupal community feels pretty small.

Congratulations Yogesh! I am looking forward to seeing you in Hobart.

If you want to meet Yogesh before DrupalSouth, we still have some seats available for our 73780151419">2 day git training course that’s running on 25-26 November. If you won’t be in Hobart, contact us to discuss your training needs.