Planet LUV

September 23, 2020

etbeQemu (KVM) and 9P (Virtfs) Mounts

I’ve tried setting up the Qemu (in this case KVM as it uses the Qemu code in question) 9P/Virtfs filesystem for sharing files to a VM. Here is the Qemu documentation for it [1].

VIRTFS="-virtfs local,path=/vmstore/virtfs,security_model=mapped-xattr,id=zz,writeout=immediate,fmode=0600,dmode=0700,mount_tag=zz"
VIRTFS="-virtfs local,path=/vmstore/virtfs,security_model=passthrough,id=zz,writeout=immediate,mount_tag=zz"

Above are the 2 configuration snippets I tried on the server side. The first uses mapped xattrs (which means that all files will have the same UID/GID and on the host XATTRs will be used for storing the Unix permissions) and the second uses passthrough which requires KVM to run as root and gives the same permissions on the host as on the VM. The advantages of passthrough are better performance through writing less metadata and having the same permissions in host and VM. The advantages of mapped XATTRs are running KVM/Qemu as non-root and not having a SUID file in the VM imply a SUID file in the host.

Here is the link to Bonnie++ output comparing Ext3 on a KVM block device (stored on a regular file in a BTRFS RAID-1 filesystem on 2 SSDs on the host), a NFS share from the host from the same BTRFS filesystem, and virtfs shares of the same filesystem. The only tests that Ext3 doesn’t win are some of the latency tests, latency is based on the worst-case not the average. I expected Ext3 to win most tests, but didn’t expect it to lose any latency tests.

Here is a link to Bonnie++ output comparing just NFS and Virtfs. It’s obvious that Virtfs compares poorly, giving about half the performance on many tests. Surprisingly the only tests where Virtfs compared well to NFS were the file creation tests which I expected Virtfs with mapped XATTRs to do poorly due to the extra metadata.

Here is a link to Bonnie++ output comparing only Virtfs. The options are mapped XATTRs with default msize, mapped XATTRs with 512k msize (I don’t know if this made a difference, the results are within the range of random differences), and passthrough. There’s an obvious performance benefit in passthrough for the small file tests due to the less metadata overhead, but as creating small files isn’t a bottleneck on most systems a 20% to 30% improvement in that area probably doesn’t matter much. The result from the random seeks test in passthrough is unusual, I’ll have to do more testing on that.

SE Linux

On Virtfs the XATTR used for SE Linux labels is passed through to the host. So every label used in a VM has to be valid on the host and accessible to the context of the KVM/Qemu process. That’s not really an option so you have to use the context mount option. Having the mapped XATTR mode work for SE Linux labels is a necessary feature.

Conclusion

The msize mount option in the VM doesn’t appear to do anything and it doesn’t appear in /proc/mounts, I don’t know if it’s even supported in the kernel I’m using.

The passthrough and mapped XATTR modes give near enough performance that there doesn’t seem to be a benefit of one over the other.

NFS gives significant performance benefits over Virtfs while also using less CPU time in the VM. It has the issue of files named .nfs* hanging around if the VM crashes while programs were using deleted files. It’s also more well known, ask for help with an NFS problem and you are more likely to get advice than when asking for help with a virtfs problem.

Virtfs might be a better option for accessing databases than NFS due to it’s internal operation probably being a better map to Unix filesystem semantics, but running database servers on the host is probably a better choice anyway.

Virtfs generally doesn’t seem to be worth using. I had hoped for performance that was better than NFS but the only benefit I seemed to get was avoiding the .nfs* file issue.

The best options for storage for a KVM/Qemu VM seem to be Ext3 for files that are only used on one VM and for which the size won’t change suddenly or unexpectedly (particularly the root filesystem) and NFS for everything else.

September 19, 2020

etbeBurning Lithium Ion Batteries

I had an old Nexus 4 phone that was expanding and decided to test some of the theories about battery combustion.

The first claim that often gets made is that if the plastic seal on the outside of the battery is broken then the battery will catch fire. I tested this by cutting the battery with a craft knife. With every cut the battery sparked a bit and then when I levered up layers of the battery (it seems to be multiple flat layers of copper and black stuff inside the battery) there were more sparks. The battery warmed up, it’s plausible that in a confined environment that could get hot enough to set something on fire. But when the battery was resting on a brick in my backyard that wasn’t going to happen.

The next claim is that a Li-Ion battery fire will be increased with water. The first thing to note is that Li-Ion batteries don’t contain Lithium metal (the Lithium high power non-rechargeable batteries do). Lithium metal will seriously go off it exposed to water. But lots of other Lithium compounds will also react vigorously with water (like Lithium oxide for example). After cutting through most of the center of the battery I dripped some water in it. The water boiled vigorously and the corners of the battery (which were furthest away from the area I cut) felt warmer than they did before adding water. It seems that significant amounts of energy are released when water reacts with whatever is inside the Li-Ion battery. The reaction was probably giving off hydrogen gas but didn’t appear to generate enough heat to ignite hydrogen (which is when things would really get exciting). Presumably if a battery was cut in the presence of water while in an enclosed space that traps hydrogen then the sparks generated by the battery reacting with air could ignite hydrogen generated from the water and give an exciting result.

It seems that a CO2 fire extinguisher would be best for a phone/tablet/laptop fire as that removes oxygen and cools it down. If that isn’t available then a significant quantity of water will do the job, water won’t stop the reaction (it can prolong it), but it can keep the reaction to below 100C which means it won’t burn a hole in the floor and the range of toxic chemicals released will be reduced.

The rumour that a phone fire on a plane could do a “China syndrome” type thing and melt through the Aluminium body of the plane seems utterly bogus. I gave it a good try and was unable to get a battery to burn through it’s plastic and metal foil case. A spare battery for a laptop in checked luggage could be a major problem for a plane if it ignited. But a battery in the passenger area seems unlikely to be a big problem if plenty of water is dumped on it to prevent the plastic case from burning and polluting the air.

I was not able to get a result that was even worthy of a photograph. I may do further tests with laptop batteries.

September 17, 2020

etbeDell BIOS Updates

I have just updated the BIOS on a Dell PowerEdge T110 II. The process isn’t too difficult, Google for the machine name and BIOS, download a shell script encoded firmware image and GPG signature, then run the script on the system in question.

One problem is that the Dell GPG key isn’t signed by anyone. How hard would it be to get a few well connected people in the Linux community to sign the key used for signing Linux scripts for updating the BIOS? I would be surprised if Dell doesn’t employ a few people who are well connected in the Linux community, they should just ask all employees to sign such GPG keys! Failing that there are plenty of other options. I’d be happy to sign the Dell key if contacted by someone who can prove that they are a responsible person in Dell. If I could phone Dell corporate and ask for the engineering department and then have someone tell me the GPG fingerprint I’ll sign the key and that problem will be partially solved (my key is well connected but you need more than one signature).

The next issue is how to determine that a BIOS update works. What you really don’t want is to have a BIOS update fail and brick your system! So the Linux update process loads the image into something (special firmware RAM maybe) and then reboots the system and the reboot then does a critical part of the update. If the reboot doesn’t work then you end up with the old version of the BIOS. This is overall a good thing.

The PowerEdge T110 II is a workstation with an NVidia video card (I tried an ATI card but that wouldn’t boot for unknown reasons). The Nouveau driver has some issues. One thing I have done to work around some Nouveau issues is to create a file “~/.config/plasma-workspace/env/nouveau-broken.sh” (for KDE sessions) with the following contents:

export LIBGL_ALWAYS_SOFTWARE=1

I previously wrote about using this just for Kmail to stop it crashing [1]. But after doing that I still had other problems with video and disabling all GL on the NVidia card was necessary.

The latest problem I’ve had is that even when using that configuration things don’t go well. When I run the “reboot” command I end up with a kernel message about the GPU not responding and then it doesn’t reboot. That means that the BIOS update doesn’t apply, a hard reboot signals to the system that the new BIOS wasn’t good and I end up with the old BIOS again. I discovered that disabling sddm (the latest xdm program in Debian) from starting on boot meant that a reboot command would work. Then I ran the BIOS update script and it’s reboot command worked and gave a successful BIOS update.

So I’ve gone from a 2013 BIOS to a 2018 BIOS! The update list says that some CVEs have been addressed, but the spectre-meltdown-checker doesn’t report any fewer vulnerabilities.

September 15, 2020

etbeMore About the PowerEdge T710

I’ve got the T710 (mentioned in my previous post [1]) online. When testing the T710 at home I noticed that sometimes the VGA monitor I was using would start flickering when in some parts of the BIOS setup, it seemed that the horizonal sync wasn’t working properly. It didn’t seem to be a big deal at the time. When I deployed it the KVM display that I had planned to use with it mostly didn’t display anything. When the display was working the KVM keyboard wouldn’t work (and would prevent a regular USB keyboard from working if they were both connected at the same time). The VGA output of the T710 also wouldn’t work with my VGA->HDMI device so I couldn’t get it working with my portable monitor.

Fortunately the Dell front panel has a display and tiny buttons that allow configuring the IDRAC IP address, so I was able to get IDRAC going. One thing Dell really should do is allow the down button to change 0 to 9 when entering numbers, that would make it easier to enter 8.8.8.8 for the DNS server. Another thing Dell should do is make the default gateway have a default value according to the IP address and netmask of the server.

When I got IDRAC going it was easy to setup a serial console, boot from a rescue USB device, create a new initrd with the driver for the MegaRAID controller, and then reboot into the server image.

When I transferred the SSDs from the old server to the newer Dell server the problem I had was that the Dell drive caddies had no holes in suitable places for attaching SSDs. I ended up just pushing the SSDs in so they are hanging in mid air attached only by the SATA/SAS connectors. Plugging them in took the space from the above drive, so instead of having 2*3.5″ disks I have 1*2.5″ SSD and need the extra space to get my hand in. The T710 is designed for 6*3.5″ disks and I’m going to have trouble if I ever want to have more than 3*2.5″ SSDs. Fortunately I don’t think I’ll need more SSDs.

After booting the system I started getting alerts about a “fault” in one SSD, with no detail on what the fault might be. My guess is that the SSD in question is M.2 and it’s in a M.2 to regular SATA adaptor which might have some problems. The data seems fine though, a BTRFS scrub found no checksum errors. I guess I’ll have to buy a replacement SSD soon.

I configured the system to use the “nosmt” kernel command line option to disable hyper-threading (which won’t provide much performance benefit but which makes certain types of security attacks much easier). I’ve configured BOINC to run on 6/8 CPU cores and surprisingly that didn’t cause the fans to be louder than when the system was idle. It seems that a system that is designed for 6 SAS disks doesn’t need a lot of cooling when run with SSDs.

July 17, 2020

Dave HallIf You’re not Using YAML for CloudFormation Templates, You’re Doing it Wrong

In my last blog post, I promised a rant about using YAML for CloudFormation templates. Here it is. If you persevere to the end I’ll also show you have to convert your existing JSON based templates to YAML.

Many of the points I raise below don’t just apply to CloudFormation. They are general comments about why you should use YAML over JSON for configuration when you have a choice.

One criticism of YAML is its reliance on indentation. A lot of the code I write these days is Python, so indentation being significant is normal. Use a decent editor or IDE and this isn’t a problem. It doesn’t matter if you’re using JSON or YAML, you will want to validate and lint your files anyway. How else will you find that trailing comma in your JSON object?

Now we’ve got that out of the way, let me try to convince you to use YAML.

As developers we are regularly told that we need to document our code. CloudFormation is Infrastructure as Code. If it is code, then we need to document it. That starts with the Description property at the top of the file. If you JSON for your templates, that’s it, you have no other opportunity to document your templates. On the other hand, if you use YAML you can add inline comments. Anywhere you need a comment, drop in a hash # and your comment. Your team mates will thank you.

JSON templates don’t support multiline strings. These days many developers have 4K or ultra wide monitors, we don’t want a string that spans the full width of our 34” screen. Text becomes harder to read once you exceed that “90ish” character limit. With JSON your multiline string becomes "[90ish-characters]\n[another-90ish-characters]\n[and-so-on"]. If you opt for YAML, you can use the greater than symbol (>) and then start your multiline comment like so:

Description: >
  This is the first line of my Description
  and it continues on my second line
  and I'll finish it on my third line.

As you can see it much easier to work with multiline string in YAML than JSON.

“Folded blocks” like the one above are created using the > replace new lines with spaces. This allows you to format your text in a more readable format, but allow a machine to use it as intended. If you want to preserve the new line, use the pipe (|) to create a “literal block”. This is great for an inline Lambda functions where the code remains readable and maintainable.

  APIFunction:
    Type: AWS::Lambda::Function
    Properties:
      Code:
        ZipFile: |
          import json
          import random


          def lambda_handler(event, context):
              return {"statusCode": 200, "body": json.dumps({"value": random.random()})}
      FunctionName: "GetRandom"
      Handler: "index.lambda_handler"
      MemorySize: 128
      Role: !GetAtt LambdaServiceRole.Arn
      Runtime: "python3.7"
		Timeout: 5

Both JSON and YAML require you to escape multibyte characters. That’s less of an issue with CloudFormation templates as generally you’re only using the ASCII character set.

In a YAML file generally you don’t need to quote your strings, but in JSON double quotes are used every where, keys, string values and so on. If your string contains a quote you need to escape it. The same goes for tabs, new lines, backslashes and and so on. JSON based CloudFormation templates can be hard to read because of all the escaping. It also makes it harder to handcraft your JSON when your code is a long escaped string on a single line.

Some configuration in CloudFormation can only be expressed as JSON. Step Functions and some of the AppSync objects in CloudFormation only allow inline JSON configuration. You can still use a YAML template and it is easier if you do when working with these objects.

The JSON only configuration needs to be inlined in your template. If you’re using JSON you have to supply this as an escaped string, rather than nested objects. If you’re using YAML you can inline it as a literal block. Both YAML and JSON templates support functions such as Sub being applied to these strings, it is so much more readable with YAML. See this Step Function example lifted from the AWS documentation:

MyStateMachine:
  Type: "AWS::StepFunctions::StateMachine"
  Properties:
    DefinitionString:
      !Sub |
        {
          "Comment": "A simple AWS Step Functions state machine that automates a call center support session.",
          "StartAt": "Open Case",
          "States": {
            "Open Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:open_case",
              "Next": "Assign Case"
            }, 
            "Assign Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:assign_case",
              "Next": "Work on Case"
            },
            "Work on Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:work_on_case",
              "Next": "Is Case Resolved"
            },
            "Is Case Resolved": {
                "Type" : "Choice",
                "Choices": [ 
                  {
                    "Variable": "$.Status",
                    "NumericEquals": 1,
                    "Next": "Close Case"
                  },
                  {
                    "Variable": "$.Status",
                    "NumericEquals": 0,
                    "Next": "Escalate Case"
                  }
              ]
            },
             "Close Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:close_case",
              "End": true
            },
            "Escalate Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:escalate_case",
              "Next": "Fail"
            },
            "Fail": {
              "Type": "Fail",
              "Cause": "Engage Tier 2 Support."    }   
          }
        }

If you’re feeling lazy you can use inline JSON for IAM policies that you’ve copied from elsewhere. It’s quicker than converting them to YAML.

YAML templates are smaller and more compact than the same configuration stored in a JSON based template. Smaller yet more readable is winning all round in my book.

If you’re still not convinced that you should use YAML for your CloudFormation templates, go read Amazon’s blog post from 2017 advocating the use of YAML based templates.

Amazon makes it easy to convert your existing templates from JSON to YAML. cfn-flip is aPython based AWS Labs tool for converting CloudFormation templates between JSON and YAML. I will assume you’ve already installed cfn-flip. Once you’ve done that, converting your templates with some automated cleanups is just a command away:

cfn-flip --clean template.json template.yaml

git rm the old json file, git add the new one and git commit and git push your changes. Now you’re all set for your new life using YAML based CloudFormation templates.

If you want to learn more about YAML files in general, I recommend you check our Learn X in Y Minutes’ Guide to YAML. If you want to learn more about YAML based CloudFormation templates, check Amazon’s Guide to CloudFormation Templates.

July 09, 2020

Dave HallLogging Step Functions to CloudWatch

Many AWS Services log to CloudWatch. Some do it out of the box, others need to be configured to log properly. When Amazon released Step Functions, they didn’t include support for logging to CloudWatch. In February 2020, Amazon announced StepFunctions could now log to CloudWatch. Step Functions still support CloudTrail logs, but CloudWatch logging is more useful for many teams.

Users need to configure Step Functions to log to CloudWatch. This is done on a per State Machine basis. Of course you could click around he console to enable it, but that doesn’t scale. If you use CloudFormation to manage your Step Functions, it is only a few extra lines of configuration to add the logging support.

In my example I will assume you are using YAML for your CloudFormation templates. I’ll save my “if you’re using JSON for CloudFormation you’re doing it wrong” rant for another day. This is a cut down example from one of my services:

---
AWSTemplateFormatVersion: '2010-09-09'
Description: StepFunction with Logging Example.
Parameters:
Resources:
  StepFunctionExecRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service: !Sub "states.${AWS::Region}.amazonaws.com"
          Action:
          - sts:AssumeRole
      Path: "/"
      Policies:
      - PolicyName: StepFunctionExecRole
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Effect: Allow
            Action:
            - lambda:InvokeFunction
            - lambda:ListFunctions
            Resource: !Sub "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:my-lambdas-namespace-*"
          - Effect: Allow
            Action:
            - logs:CreateLogDelivery
            - logs:GetLogDelivery
            - logs:UpdateLogDelivery
            - logs:DeleteLogDelivery
            - logs:ListLogDeliveries
            - logs:PutResourcePolicy
            - logs:DescribeResourcePolicies
            - logs:DescribeLogGroups
            Resource: "*"
  MyStateMachineLogGroup:
    Type: AWS::Logs::LogGroup
    Properties:
      LogGroupName: /aws/stepfunction/my-step-function
      RetentionInDays: 14
  DashboardImportStateMachine:
    Type: AWS::StepFunctions::StateMachine
    Properties:
      StateMachineName: my-step-function
      StateMachineType: STANDARD
      LoggingConfiguration:
        Destinations:
          - CloudWatchLogsLogGroup:
             LogGroupArn: !GetAtt MyStateMachineLogGroup.Arn
        IncludeExecutionData: True
        Level: ALL
      DefinitionString:
        !Sub |
        {
          ... JSON Step Function definition goes here
        }
      RoleArn: !GetAtt StepFunctionExecRole.Arn

The key pieces in this example are the second statement in the IAM Role with all the logging permissions, the LogGroup defined by MyStateMachineLogGroup and the LoggingConfiguration section of the Step Function definition.

The IAM role permissions are copied from the example policy in the AWS documentation for using CloudWatch Logging with Step Functions. The CloudWatch IAM permissions model is pretty weak, so we need to grant these broad permissions.

The LogGroup definition creates the log group in CloudWatch. You can use what ever value you want for the LogGroupName. I followed the Amazon convention of prefixing everything with /aws/[service-name]/ and then appended the Step Function name. I recommend using the RetentionInDays configuration. It stops old logs sticking around for ever. In my case I send all my logs to ELK, so I don’t need to retain them in CloudWatch long term.

Finally we use the LoggingConfiguration to tell AWS where we want to send out logs. You can only specify a single Destinations. The IncludeExecutionData determines if the inputs and outputs of each function call is logged. You should not enable this if you are passing sensitive information between your steps. The verbosity of logging is controlled by Level. Amazon has a page on Step Function log levels. For dev you probably want to use ALL to help with debugging but in production you probably only need ERROR level logging.

I removed the Parameters and Output from the template. Use them as you need to.

May 25, 2020

Stewart Smithop-build v2.5 firmware for the Raptor Blackbird

Well, following on from my post where I excitedly pointed out that Raptor Blackbird support: all upstream in op-build v2.5, that means I can do another in my series of (close to) upstream Blackbird firmware builds.

This time, the only difference from straight upstream op-build v2.5 is my fixes for buildroot so that I can actually build it on Fedora 32.

So, head over to https://www.flamingspork.com/blackbird/op-build-v2.5-blackbird-images/ and grab blackbird.pnor to flash it on your blackbird, let me know how it goes!

May 23, 2020

Stewart SmithRefurbishing my Macintosh Plus

Somewhere in the mid to late 1990s I picked myself up a Macintosh Plus for the sum of $60AUD. At that time there were still computer Swap Meets where old and interesting equipment was around, so I headed over to one at some point (at the St Kilda Town Hall if memory serves) and picked myself up four 1MB SIMMs to boost the RAM of it from the standard 1MB to the insane amount of 4MB. Why? Umm… because I could? The RAM was pretty cheap, and somewhere in the house to this day, I sometimes stumble over the 256KB SIMMs as I just can’t bring myself to get rid of them.

This upgrade probably would have cost close to $2,000 at the system’s release. If the Macintosh system software were better at disk caching you could have easily held the whole 800k of the floppy disk in memory and still run useful software!

One of the annoying things that started with the Macintosh was odd screws and Apple gear being hard to get into. Compare to say, the Apple ][ which had handy clips to jump inside whenever. In fitting my massive FOUR MEGABYTES of RAM back in the day, I recall using a couple of allen keys sticky-taped together to be able to reach in and get the recessed Torx screws. These days, I can just order a torx bit off Amazon and have it arrive pretty quickly. Well, two torx bits, one of which is just too short for the job.

My (dusty) Macintosh Plus

One thing had always struck me about it, it never really looked like the photos of the Macintosh Plus I saw in books. In what is an embarrassing number of years later, I learned that a lot can be gotten from the serial number printed on the underside of the front of the case.

So heading over to the My Old Mac Serial Number Decoder I can find out:

Manufactured in: F => Fremont, California, USA
Year of production: 1985
Week of production: 14
Production number: 3V3 => 4457
Model ID: M0001WP => Macintosh 512K (European Macintosh ED)

Your Macintosh 512K (European Macintosh ED) was the 4457th Mac manufactured during the 14th week of 1985 in Fremont, California, USA.

Pretty cool! So it is certainly a Plus as the logic board says that, but it’s actually an upgraded 512k! If you think it was madness to have a GUI with only 128k of RAM in the original Macintosh, you’d be right. I do not envy anybody who had one of those.

Some time a decent (but not too many, less than 10) years ago, I turn on the Mac Plus to see if it still worked. It did! But then… some magic smoke started to come out (which isn’t so good), but the computer kept working! There’s something utterly bizarre about looking at a computer with smoke coming out of it that continues to function perfectly fine.

Anyway, as the smoke was coming out, I decided that it would be an opportune time to turn it off, open doors and windows, and put it away until I was ready to deal with it.

One Global Pandemic Later, and now was the time.

I suspected it was going to be a capacitor somewhere that blew, and figured that I should replace it, and probably preemptively replace all the other electrolytic capacitors that could likely leak and cause problems.

First thing’s first though: dismantle it and clean everything. First, taking the case off. Apple is not new to the game of annoying screws to get into things. I ended up spending $12 on this set on Amazon, as the T10 bit can actually reach the screws holding the case on.

Cathode Ray Tubes are not to be messed with. We’re talking lethal voltages here. It had been many years since electricity went into this thing, so all was good. If this all doesn’t work first time when reassembling it, I’m not exactly looking forward to discharging a CRT and working on it.

The inside of my Macintosh Plus, with lots of grime.

You can see there’s grime everywhere. It’s not the worst in the world, but it’s not great (and kinda sticky). Obviously, this needs to be cleaned! The best way to do that is take a lot of photos, dismantle everything, and clean it a bit at a time.

There’s four main electronic components inside a Macintosh Plus:

  1. The CRT itself
  2. The floppy disk drive
  3. The Logic Board (what Mac people call what PC people call the motherboard)
  4. The Analog Board

There’s also some metal structure that keeps some things in place. There’s only a few connectors between things, which are pretty easy to remove. If you don’t know how to discharge a CRT and what the dangers of them are you should immediately go and find out through reading rather than finding out by dying. I would much prefer it if you dyed (because creative fun) rather than died.

Once the floppy connector and the power connector is unplugged, the logic board slides out pretty easily. You can see from the photo below that I have the 4MB of RAM installed and the resistor you need to snip is, well, snipped (but look really closely for that). Also, grime.

Macintosh Plus Logic Board

Cleaning things? Well, there’s two ways that I have used (and considering I haven’t yet written the post with “hurray, it all works”, currently take it with a grain of salt until I write that post). One: contact cleaner. Two: detergent.

Macintosh Plus Logic Board (being washed in my sink)

I took the route of cleaning things first, and then doing recapping adventures. So it was some contact cleaner on the boards, and then some soaking with detergent. This actually all worked pretty well.

Logic Board Capacitors:

  • C5, C6, C7, C12, C13 = 33uF 16V 85C (measured at 39uF, 38uF, 38uF, 39uF)
  • C14 = 1uF 50V (measured at 1.2uF and then it fluctuated down to around 1.15uF)

Analog Board Capacitors

  • C1 = 35V 3.9uF (M) measured at 4.37uF
  • C2 = 16V 4700uF SM measured at 4446uF
  • C3 = 16V 220uF +105C measured at 234uF
  • C5 = 10V 47uF 85C measured at 45.6uF
  • C6 = 50V 22uF 85C measured at 23.3uF
  • C10 = 16V 33uF 85C measured at 37uF
  • C11 = 160V 10uF 85C measured at 11.4uF
  • C12 = 50V 22uF 85C measured at 23.2uF
  • C18 = 16V 33uF 85C measured at 36.7uF
  • C24 = 16V 2200uF 105C measured at 2469uF
  • C27 = 16V 2200uF 105C measured at 2171uF (although started at 2190 and then went down slowly)
  • C28 = 16V 1000uF 105C measured at 638uF, then 1037uF, then 1000uF, then 987uF
  • C30 = 16V 2200uF 105C measured at 2203uF
  • C31 = 16V 220uF 105C measured at 236uF
  • C32 = 16V 2200uF 105C measured at 2227uF
  • C34 = 200V 100uF 85C measured at 101.8uF
  • C35 = 200V 100uF 85C measured at 103.3uF
  • C37 = 250V 0.47uF measured at <exploded>. wheee!
  • C38 = 200V 100uF 85C measured at 103.3uF
  • C39 = 200V 100uF 85C mesaured at 99.6uF (with scorch marks from next door)
  • C42 = 10V 470uF 85C measured at 556uF
  • C45 = 10V 470uF 85C measured at 227uF, then 637uF then 600uF

I’ve ordered an analog board kit from https://console5.com/store/macintosh-128k-512k-plus-analog-pcb-cap-kit-630-0102-661-0462.html and when trying to put them in, I learned that the US Analog board is different to the International Analog board!!! Gah. Dammit.

Note that C30, C32, C38, C39, and C37 were missing from the kit I received (probably due to differences in the US and International boards). I did have an X2 cap (for C37) but it was 0.1uF not 0.47uF. I also had two extra 1000uF 16V caps.

Macintosh Repair and Upgrade Secrets (up to the Mac SE no less!) holds an Appendix with the parts listing for both the US and International Analog boards, and this led me to conclude that they are in fact different boards rather than just a few wires that are different. I am not sure what the “For 120V operation, W12 must be in place” and “for 240V operation, W12 must be removed” writing is about on the International Analog board, but I’m not quite up to messing with that at the moment.

So, I ordered the parts (linked above) and waited (again) to be able to finish re-capping the board.

I found https://youtu.be/H9dxJ7uNXOA video to be a good one for learning a bunch about the insides of compact Macs, I recommend it and several others on his YouTube channel. One interesting thing I learned is that the X2 cap (C37 on the International one) is before the power switch, so could blow just by having the system plugged in and not turned on! Okay, so I’m kind of assuming that it also applies to the International board, and mine exploded while it was plugged in and switched on, so YMMV.

Additionally, there’s an interesting list of commonly failing parts. Unfortunately, this is also for the US logic board, so the tables in Macintosh Repair and Upgrade Secrets are useful. I’m hoping that I don’t have to replace anything more there, but we’ll see.

But, after the Nth round of parts being delivered….

Note the lack of an exploded capacitor

Yep, that’s where the exploded cap was before. Cleanup up all pretty nicely actually. Annoyingly, I had to run it all through a step-up transformer as the board is all set for Australian 240V rather than US 120V. This isn’t going to be an everyday computer though, so it’s fine.

Woohoo! It works. While I haven’t found my supply of floppy disks that (at least used to) work, the floppy mechanism also seems to work okay.

Next up: waiting for my Floppy Emu to arrive as it’ll certainly let it boot. Also, it’s now time to rip the house apart to find a floppy disk that certainly should have made its way across the ocean with the move…. Oh, and also to clean up the mouse and keyboard.

May 16, 2020

Stewart SmithRaptor Blackbird support: all upstream in op-build

Thanks to my most recent PR being merged, op-build v2.5 will have full support for the Raptor Blackbird! This includes support for the “IPL Monitor” that’s required to get fan control going.

Note that if you’re running Fedora 32 then you need some patches to buildroot to have it build, but if you’re building on something a little older, then upstream should build and work straight out of the box (err… git tree).

I also note that the work to get Secure Boot for an OS Kernel going is starting to make its way out for code reviews, so that’s something to look forward to (although without a TPM we’re going to need extra code).

May 13, 2020

Stewart SmithA op-build v2.5-rc1 based Raptor Blackbird Build

I have done a few builds of firmware for the Raptor Blackbird since I got mine, each of them based on upstream op-build plus a few patches. The previous one was Yet another near-upstream Raptor Blackbird firmware build that I built a couple of months ago. This new build is based off the release candidate of op-build v2.5. Here’s what’s changed:

PackageOld VersionNew Version
hcodehw030220a.opmsthw050520a.opmst
hostbootacdff8a390a2654dd52fed67bdebe2b5
kexec-lite18ec88310c4134e6b0130b3c1ea489e
libflashv6.5-228-g82aed17av6.6
linuxv5.4.22v5.4.33
linux-headersv5.4.22v5.4.33
machine-xml17e9e84d504582c88e782e30829e0d6be
occ3ab29212518e65740ab4dc96fd6cf584c42
openpower-pnor6fb8d914134d544a84175f00d9c6dc395faf3
sbec318ab00116d92f08c78fb7838495ad0aab7
skibootv6.5-228-g82aed17av6.6
Changes in my latest Blackbird build

Go grab blackbird.pnor from https://www.flamingspork.com/blackbird/stewart-blackbird-6-images/, and give it a go! Just scp it to your BMC, and flash it:

pflash -E -p /tmp/blackbird.pnor

There’s two differences from upstream op-build: my pull request to op-build, and the fixing of the (old) buildroot so that it’ll build on Fedora 32. From discussions on the openpower-firmware mailing list, it seems that one hopeful thing is to have all the Blackbird support merged in before the final op-build v2.5 is tagged. The previous op-build release (v2.4) was tagged in July 2019, so we’re about 10 months into what was a 2 month release cycle, so speculating on when that final release will be is somewhat difficult.

April 01, 2020

Dave HallZoom's Make or Break Moment

Zoom is experiencing massive growth as large sections of the workforce transition to working from home. At the same time many problems with Zoom are coming to light. This is their make or break moment. If they fix the problems they end up with a killer video conferencing app. The alternative is that they join Cisco's Webex in the dumpster fire of awful enterprise software.

In the interest of transparency I am a paying Zoom customer and I use it for hours every day. I also use Webex (under protest) as it is a client's video conferencing platform of choice.

In the middle of last year Jonathan Leitschuh disclosed two bugs in zoom with security and privacy implications . There was a string of failures that lead to these bugs. To Zoom’s credit they published a long blog post about why these “features” were there in the first place.

Over the last couple of weeks other issues with Zoom have surfaced. “Zoom bombing” or using random 9 digit numbers to find meetings has become a thing. This is caused by zoom’s meeting rooms having a 9 digit code to join. That’s really handy when you have to dial in and enter the number on your telephone keypad. The down side is that you have a 1 in 999 999 999 chance of joining a meeting when using a random number. Zoom does offer the option of requiring a password or PIN for each call. Unfortunately it isn’t the default. Publishing a blog post on how to secure your meetings isn’t enough, the app needs to be more secure by default. The app should default to enabling a 6 digit PIN when creating a meeting.

The Intercept is reporting Zoom’s marketing department got a little carried away when describing the encryption used in the product. This is an area where words matter. Encryption in transit is a base line requirement in communication tools these days. Zoom has this, but their claims about end to end encryption appear to be false. End to end encryption is very important for some use cases. I await the blog post explaining this one.

I don’t know why Proton Mail’s privacy issues blog post got so much attention. This appears to be based on someone skimming the documentation rather than any real testing. Regardless the post got a lot of traction. Some of the same issues were flagged by the EFF.

Until recently zoom’s FAQ read “Does Zoom sell Personal Data? […] Depends what you mean by ‘sell’”. I’m sure that sounded great in a meeting but it is worrying when you read it as a customer. Once called out on social media it was quickly updated and a blog post published. In the post, Zoom assures users it isn’t selling their data.

Joseph Cox reported late last week that Zoom was sending data to Facebook every time someone used their iOS app. It is unclear if Joe gave Zoom an opportunity to fix the issue before publishing the article. The company pushed out a fix after the story broke.

The most recent issue broke yesterday about the Zoom macOS installer behaving like malware. This seems pretty shady behaviour, like their automatic reinstaller that was fixed last year. To his credit, Zoom Founder and CEO, Eric Yuan engaged with the issue on twitter. This will be one to watch over the coming days.

Over the last year I have seen a consistent pattern when Zoom is called out on security and valid privacy issues with their platform. They respond publicly with “oops my bad” blog posts . Many of the issues appear to be a result of them trying to deliver a great user experience. Unfortunately they some times lean too far toward the UX and ignore the security and privacy implications of their choices. I hope that over the coming months we see Zoom correct this balance as problems are called out. If they do they will end up with an amazing platform in terms of UX while keeping their users safe.

Update Since publishing this post additional issues with Zoom were reported. Zoom's CEO announced the company was committed to fixing their product.

November 16, 2019

Dave HallDrupalSouth Diversity Scholarship Winner Announced

A few weeks ago we announced our diversity scholarship for DrupalSouth. Before announcing the winner I want to talk a bit about our experience doing this for the first time.

DrupalSouth is the largest Drupal event held in Oceania every year. It provides a great marketing opportunity for businesses wanting to promote their products and services to the Drupal community. Dave Hall Consulting planned to sponsor DrupalSouth to promote our new training business - Getting It Live training. By the time we got organised all of the (affordable) sponsorship opportunities had gone. After considering various opportunities around the event we felt the best way of investing a similar amount of money and giving something back to the community was through a diversity scholarship

The community provided positive feedback about the initiative. However despite the enthusiasm and working our networks to get a range of applicants, we only ended up with 7 applicants. They were all guys. One applicant was from Australia, the rest were from overseas. About half the applicants dropped out when contacted to confirm that they could cover their own travel and visa expenses.

We are likely to offer other scholarships in the future. We will start earlier and explore other channels for promoting the program.

The scholarship has been awarded to Yogesh Ingale, from Mumbai, India. Over the last 3 years Yogesh has been employed by Tata Consultancy Services’ digital operations team as a DevOps Engineer. During this time he has worked with Drupal, Cloud Computing, Python and Web Technologies. Yogesh is interested in automating processes. When he’s not working, Yogesh likes to travel, automate things and write blog posts. Disclaimer: I know Yogesh through my work with one of my clients. Some times the Drupal community feels pretty small.

Congratulations Yogesh! I am looking forward to seeing you in Hobart.

If you want to meet Yogesh before DrupalSouth, we still have some seats available for our 73780151419">2 day git training course that’s running on 25-26 November. If you won’t be in Hobart, contact us to discuss your training needs.

November 10, 2019

Julien GoodwinSome thoughts on Storytelling as an engineering teaching tool

Every week at work on Wednesday afternoons we have the SRE ops review, a relaxed two hour affair where SREs (& friends of, not all of whom are engineers) share interesting tidbits that have happened over the last week or so, this might be a great success, an outage, a weird case, or even a thorny unsolved problem. Usually these relate to a service the speaker is oncall for, or perhaps a dependency or customer service, but we also discuss major incidents both internal & external. Sometimes a recent issue will remind one of the old-guard (of which I am very much now a part) of a grand old story and we share those too.

Often the discussion continues well into the evening as we decant to one of the local pubs for dinner & beer, sometimes chatting away until closing time (probably quite regularly actually, but I'm normally long gone).

It was at one of these nights at the pub two months ago (sorry!), that we ended up chatting about storytelling as a teaching tool, and a colleague asked an excellent question, that at the time I didn't have a ready answer for, but I've been slowly pondering, and decided to focus on over an upcoming trip.

As I start to write the first draft of this post I've just settled in for cruise on my first international trip in over six months[1], popping over to Singapore for the Melbourne Cup weekend, and whilst I'd intended this to be a holiday, I'm so terrible at actually having a holiday[2] that I've ended up booking two sessions of storytelling time, where I present the history of Google's production networks (for those of you reading this who are current of former engineering Googlers, similar to Traffic 101). It's with this perspective of planning, and having run those sessions that I'm going to try and answer the question that I was asked.

Or at least, I'm going to split up the question I was asked and answer each part.

"What makes storytelling good"

On its own this is hard to answer, there are aspects that can help, such as good presentation skills (ideally keeping to spoken word, but simple graphs, diagrams & possibly photos can help), but a good story can be told in a dry technical monotone and still be a good story. That said, as with the rest of these items charisma helps.

"What makes storytelling interesting"

In short, a hook or connection to the audience, for a lot of my infrastructure related outage stories I have enough context with the audience to be able to tie the impact back in a way that resonates with a person. For larger disparate groups shared languages & context help ensure that I'm not just explaining to one person.

In these recent sessions one was with a group of people who work in our Singapore data centre, in that session I focused primarily on the history & evolution of our data centre fabrics, giving them context to understand why some of the (at face level) stranger design decisions have been made that way.

The second session was primarily people involved in the deployment side of our backbone networks, and so I focused more on the backbones, again linking with knowledge the group already had.

"What makes storytelling entertaining"

Entertaining storytelling is a matter of style, skills and charisma, and while many people can prepare (possibly with help) an entertaining talk, the ability to tell an entertaining story off the cuff is more of a skill, luckily for me, one I seem to do ok with. Two things that can work well are dropping in surprises, and where relevant some level of self-deprecation, however both need to be done very carefully.

Surprises can work very well when telling a story chronologically "I assumed X because Y, <five minutes of waffling>, so it turned out I hadn't proved Y like I thought, so it wasn't X, it was Z", they can help the audience to understand why a problem wasn't solved so easily, and explaining "traps for young players" as Dave Jones (of the EEVblog) likes to say can themselves be really helpful learning elements. Dropping surprises that weren't surprises to the story's protagonist generally only works if it's as a punchline of a joke, and even then it often doesn't.

Self-deprecation is an element that I've often used in the past, however more recently I've called others out on using it, and have been trying to reduce it myself, depending on the audience you might appear as a bumbling success or stupid, when the reality may be that nobody understood the situation properly, even if someone should have. In the ops review style of storytelling, it can also lead to a less experienced audience feeling much less confident in general than they should, which itself can harm productivity and careers.

If the audience already had relevant experience (presenting a classic SRE issue to other SREs for example, a network issue to network engineers, etc.) then audience interaction can work very well for engagement. "So the latency graph for database queries was going up and to the right, what would you look at?" This is also similar to one of the ways to run a "wheel of misfortune" outage simulation.

"What makes storytelling useful & informative at the same time"

In the same way as interest, to make storytelling useful & informative for the audience involves consideration for the audience, as a presenter if you know the audience, at least in broad strokes this helps. As I mentioned above, when I presented my talk to a group of datacenter-focused people I focused on the DC elements, connecting history to the current incarnations; when I presented to a group of more general networking folk a few days later, I focused more on the backbones and other elements they'd encountered.

Don't assume that a story will stick wholesale, just leaving a few keywords, or even just a vague memory with a few key words they can go digging for can make all the difference in the world. Repetition works too, sharing many interesting stories that share the same moral (for an example, one of the ops review classics is demonstrations about how lack of exponential backoff can make recovery from outages hard), hearing this over dozens of different stories over weeks (or months, or years...) it eventually seeps in as something to not even question having been demonstrated as such an obvious foundation of good systems.

When I'm speaking to an internal audience I'm happy if they simply remember that I (or my team) exist and might be worth reaching out to in future if they have questions.

Lastly, storytelling is a skill you need to practice, whether a keynote presentation in front of a few thousand people, or just telling tall takes to some mates at the pub practice helps, and eventually many of the elements I've mentioned above become almost automatic. As can probably be seen from this post I could do with some more practice on the written side.

1: As I write these words I'm aboard a Qantas A380 (QF1) flying towards Singapore, the book I'm currently reading, of all things about mechanical precision ("Exactly: How Precision Engineers Created the Modern World" or as it has been retitled for paperback "The Perfectionists"), has a chapter themed around QF32, the Qantas A380 that notoriously had to return to Singapore after an uncontained engine failure. Both the ATSB report on the incident and the captain Richard de Crespigny's book QF32 are worth reading. I remember I burned though QF32 one (very early) morning when I was stuck in GlobalSwitch Sydney waiting for approval to repatch a fibre, one of the few times I've actually dealt with the physical side of Google's production networks, and to date the only time the fact I live just a block from that facility has been used at all sensibly.

2: To date, I don't think I've ever actually had a holiday that wasn't organised by family, or attached to some conference, event or work travel I'm attending. This trip is probably the closest I've ever managed (roughly equal to my burnout trip to Hawaii in 2014), and even then I've ruined it by turning two of the three weekdays into work. I'm much better at taking breaks that simply involve not leaving home or popping back to stay with family in Melbourne.

April 27, 2019

Julien GoodwinBuilding new pods for the Spectracom 8140 using modern components

I've mentioned a bunch of times on the time-nuts list that I'm quite fond of the Spectracom 8140 system for frequency distribution. For those not familiar with it, it's simply running a 10MHz signal against a 12v DC power feed so that line-powered pods can tap off the reference frequency and use it as an input to either a buffer (10MHz output pods), decimation logic (1MHz, 100kHz etc.), or a full synthesizer (Versa-pods).

It was only in October last year that I got a house frequency standard going using an old Efratom FRK-LN which now provides the reference; I'd use a GPSDO, but I live in a ground floor apartment without a usable sky view, this of course makes it hard to test some of the GPS projects I'm doing. Despite living in a tiny apartment I have test equipment in two main places, so the 8140 is a great solution to allow me to lock all of them to the house standard.


(The rubidium is in the chunky aluminium chassis underneath the 8140)

Another benefit of the 8140 is that many modern pieces of equipment (such as my [HP/Agilent/]Keysight oscilloscope) have a single connector for reference frequency in/out, and should the external frequency ever go away it will switch back to its internal reference, but also send that back out the connector, which could lead to other devices sharing the same signal switching to it. The easy way to avoid that is to use a dedicated port from a distribution amplifier for each device like this, which works well enough until you have this situation in multiple locations.

As previously mentioned the 8140 system uses pods to add outputs, while these pods are still available quite cheaply used on eBay (as of this writing, for as low as US$8, but ~US$25/pod has been common for a while), recently the cost of shipping to Australia has gone up to the point I started to plan making my own.

By making my own pods I also get to add features that the original pods didn't have[1], I started with a quad-output pod with optional internal line termination. This allows me to have feeds for multiple devices with the annoying behaviour I mentioned earlier. The enclosure is a Pomona model 4656, with the board designed to slot in, and offer pads for the BNC pins to solder to for easy assembly.



This pod uses a Linear Technologies (now Analog Devices) LTC6957 buffer for the input stage replacing a discrete transistor & logic gate combined input stage in the original devices. The most notable change is that this stage works reliably down to -30dBm input (possibly further, couldn't test beyond that), whereas the original pods stop working right around -20dBm.

As it turns out, although it can handle lower input signal levels, in other ways including power usage it seems very similar. One notable downside is the chip tops out at 4v absolute maximum input, so a separate regulator is used just to feed this chip. The main regulator has also been changed from a 7805 to an LD1117 variant.

On this version the output stage is the same TI 74S140 dual 4-input NAND gate as was used on the original pods, just in SOIC form factor.

As with the next board there is one error on the board, the wire loop that forms the ground connection was intended to fit a U-type pin header, however the footprint I used on the boards was just too tight to allow the pins through, so I've used some thin bus wire instead.



The second major variant I designed was a combo version, allowing sine & square outputs by just switching a jumper, or isolated[2] or line-regenerator (8040TA from Spectracom) versions with a simple sub-board containing just an inductor (TA) or 1:1 transformer (isolated).



This is the second revision of that board, where the 74S140 has been replaced by a modern TI 74LVC1G17 buffer. This version of the pod, set for sine output, uses almost exactly 30mA of current (since both the old & new pods use linear supplies that's the most sensible unit), whereas the original pods are right around 33mA. The empty pods at the bottom-left are simply placeholders for 2 100 ohm resistors to add 50 ohm line termination if desired.

The board fits into the Pomona 2390 "Size A" enclosures, or for the isolated version the Pomona 3239 "Size B". This is the reason the BNC connectors have to be extended to reach the board, on the isolated boxes the BNC pins reach much deeper into the enclosure.

If the jumpers were removed, plus the smaller buffer it should be easy to fit a pod into the Pomona "Miniature" boxes too.



I was also due to create some new personal businesscards, so I arranged the circuit down to a single layer (the only jumper is the requirement to connect both ground pins on the connectors) and merged it with some text converted to KiCad footprints to make a nice card on some 0.6mm PCBs. The paper on that photo is covering the link to the build instructions, which weren't written at the time (they're *mostly* done now, I may update this post with the link later).

Finally, while I was out travelling at the start of April my new (to me) HP 4395A arrived so I've finally got some spectrum output. The output is very similar between the original and my version, with the major notable difference being that my version is 10dB worse at the third harmonic. I lack the equipment (and understanding) to properly measure phase noise, but if anyone in AU/NZ wants to volunteer their time & equipment for an afternoon I'd love an excuse for a field trip.



Spectrum with input sourced from my house rubidium (natively a 5MHz unit) via my 8140 line. Note that despite saying "ExtRef" the analyzer is synced to its internal 10811 (which is an optional unit, and uses an external jumper, hence the display note.



Spectrum with input sourced from the analyzer's own 10811, and power from the DC bias generator also from the analyzer.


1: Or at least I didn't think they had, I've since found out that there was a multi output pod, and one is currently in the post heading to me.
2: An option on the standard Spectracom pods, albeit a rare one.

January 13, 2019

Julien GoodwinTransport security for BGP, AKA BGP-STARTTLS, a proposal

Several days ago, inspired in part by an old work mail thread being resurrected I sent this image as a tweet captioned "The state of BGP transport security.":



The context of the image for those not familiar with it is this image about noSQL databases.

This triggered a bunch of discussion, with multiple people saying "so how would *you* do it", and we'll get to that (or for the TL;DR skip to the bottom), but first some background.

The tweet is a reference to the BGP protocol the Internet uses to exchange routing data between (and within) networks. This protocol (unless inside a separate container) is never encrypted, and can only be authenticated (in practice) by a TCP option known as TCP-MD5 (standardised in RFC2385). The BGP protocol itself has no native encryption or authentication support. Since routing data can often be inferred by the packets going across a link anyway, this has lead to this not being a priority to fix.

Transport authentication & encryption is a distinct issue from validation of the routing data transported by BGP, an area already being worked on by the various RPKI projects, eventually transport authentication may be able to benefit from some of the work done by those groups.

TCP-MD5 is quite limited, and while generally supported by all major BGP implementations it has one major limitation that makes it particularly annoying, in that it takes a single key, making key migration difficult (and in many otherwise sensible topologies, impossible without impact). Being a TCP option is also a pain, increasing fragility.

At the time of its introduction TCP-MD5 gave two main benefits the first was to have some basic authentication beyond the basic protocol (for which the closest element in the protocol is the validation of peer-as in the OPEN message, and a mismatch will helpfully tell you who the far side was looking for), plus making it harder to interfere with the TCP session, which on many of the TCP implementations of the day was easier than it should have been. Time, however has marched on, and protection against session interference from non-MITM is no longer needed, the major silent MITM case of Internet Exchanges using hubs is long obsolete, plus, in part due to the pain associated in changing keys many networks have a "default" key they will use when turning up a peering session, these keys are often so well known for major networks that they've often been shared on public mailing lists, eliminating what little security benefit TCP-MD5 still brings.

This has been known to be a problem for many years, and the proposed replacement TCP-AO (The TCP Authentication Option) was standardised in 2010 as RFC5925, however, to the best of my knowledge eight years later no mainstream BGP implementation supports it, and as it too is a TCP option, not only does it still has many of the downsides of MD5, but major OS kernels are much less likely to implement new ones (indeed, an OS TCP maintainer commenting such on the thread I mentioned above is what kicked off my thinking).

TCP, the wire format, is in practice unchangeable. This is one of the major reasons for QUIC, the TCP replacement protocol soon to be standardised as HTTP/3, so for universal deployment any proposal that requires changes to TCP is a non-starter.

Any solution must be implementable & deployable.
  • Implementable - BGP implementations must be able to implement it, and do so correctly, ideally with a minimum of effort.
  • Deployable - Networks need to be able to deploy it, when authentication issues occur error messages should be no worse than with standard BGP (this is an area most TCP-MD5 implementations fail at, of those I've used JunOS is a notable exception, Linux required kernel changes for it to even be *possible* to debug)


Ideally any security-critical code should already exist in a standardised form, with multiple widely-used implementations.

Fortunately for us, that exists in the form of TLS. IPSEC, while it exists, fails the deployable tests, as almost anyone who's ever had the misfortune of trying to get IPSEC working between different organisations using different implementations can attest, sure it can usually be made to work, but nowhere near as easily as TLS.

Discussions about the use of TLS for this purpose have happened before, but always quickly run into the problem of how certificates for this should be managed, and that is still an open problem, potentially the work on RPKI may eventually provide a solution here, but until that time we have a workable alternative in the form of TLS-PSK (first standardised in RFC4279), a set of TLS modes that allow the use of pre-shared keys instead of certificates (for those wondering, not only does this still exist in TLS1.3 it's in a better form). For a variety of reasons, not the least the lack of real-time clocks in many routers that may not be able to reach an NTP server until BGP is established, PSK modes are still more deployable than certificate verification today. One key benefit for TLS-PSK is it supports multiple keys to allow migration to a new key in a significantly less impactful manner.

The most obvious way to support BGP-in-TLS would simply be to listen on a new port (as is done for HTTPS for example), however there's a few reasons why I don't think such a method is deployable for BGP, primarily due to the need to update control-plane ACLs, a task that in large networks is often distant from the folk working with BGP, and in small networks may not be understood by any current staff (a situation not dissimilar to the state of TCP). Another option would simply be to use protocol multiplexing and do a TLS negotiation if a TLS hello is received, or unencrypted BGP for a BGP OPEN, this would violate the general "principal of least astonishment", and would be harder for operators to debug.

Instead I propose a design similar to that used by SMTP (where it is known as STARTTLS), during early protocol negotiation support for TLS is signalled using a zero-length capability in the BGP OPEN, the endpoints do a TLS negotiation, and then the base protocol continues inside the new TLS tunnel. Since this negotiation happens during the BGP OPEN, it does mean that other data included in the OPEN leaks. Primarily this is the ASN, but also the various other capabilities supported by the implementation (which could identify the implementation), I suggest that if TLS is required information in the initial OPEN not be validated, and standard reserved ASN be sent instead, and any other capabilities not strictly required not sent, with a fresh OPEN containing all normal information sent inside the TLS session.

Migration from TCP-MD5 is key point, however not one I can find any good answer for. Some implementations already allow TCP-MD5 to be optional, and that would allow an easy upgrade, however such support is rare, and unlikely to be more widely supported.

On that topic, allowing TLS to be optional in a consistent manner is particularly helpful, and something that I believe SHOULD be supported to allow cases like currently unauthenticated public peering sessions to be migrated to TLS with minimal impact. Allowing this does open the possibility of a downgrade attack, and make more plausible attacks causing state machine confusions (implementation believes it's talking on a TLS-secured session when it isn't).

What do we lose from TCP-MD5? Some performance, whilst this is not likely to be an issue for most situations, it is likely not an option for anyone still trying to run IPv4 full tables on a Cisco Catalyst 6500 with Sup720. We do also lose the TCP manipulation prevention aspects, however these days those are (IMO) of minimal value in practice. There's also the costs of implementations needing to include a TLS implementations, and whilst nearly every system will have one (at the very least for SSH) it may not already be linked to the routing protocol implementation.

Lastly, my apologies to anyone who has proposed this before, but my neither I nor my early reviewers were aware of such a proposal. Should such a proposal already exist, meeting the goals of implementable & deployable it may be sensible to pick that up instead.

The IETF is often said to work on "rough consensus and running code", for this proposal here's what I believe a minimal *actual* demonstration of consensus with code would be:
  • Two BGP implementations, not derived from the same source.
  • Using two TLS implementations, not derived from the same source.
  • Running on two kernels (at the very least, Linux & FreeBSD)


The TL;DR version:
  • Using a zero-length BGP capability in the BGP OPEN message implementations advertise their support for TLS
    • TLS version MUST be at least 1.3
    • If TLS is required, the AS field in the OPEN MAY be set to a well-known value to prevent information leakage, and other capabilities MAY be removed, however implementations MUST NOT require the TLS capability be the first, last or only capability in the OPEN
    • If TLS is optional, which MUST NOT be default behaviour), the OPEN MUST be (other than the capability) be the same as a session configured for no encryption
  • After the TCP client receives a conformation of TLS support from the TCP server's OPEN message, a TLS handshake begins
    • To make this deployable TLS-PSK MUST be supported, although exact configuration is TBD.
    • Authentication-only variants of TLS (ex RFC4785) REALLY SHOULD NOT be supported.
    • Standard certificate-based verification MAY be supported, and if supported MUST validate use client certificates, validating both. However, how roots of trust would work for this has not been investigated.
  • Once the TCP handshake completes the BGP state starts over with the client sending a new OPEN
    • Signalling the TLS capability in this OPEN is invalid and MUST be rejected
  • (From here, everything is unchanged from normal BGP)


Magic numbers for development:
  • Capability: (to be referred to as EXPERIMENTAL-STARTTLS) 219
  • ASN (for avoiding data leaks in OPEN messages): 123456
    • Yes this means also sending 4-byte capability. Every implementation that might possibly implement this already supports 4-byte ASNs.


The key words "MUST (BUT WE KNOW YOU WON'T)", "SHOULD CONSIDER", "REALLY SHOULD NOT", "OUGHT TO", "WOULD PROBABLY", "MAY WISH TO", "COULD", "POSSIBLE", and "MIGHT" in this document are to be interpreted as described in RFC 6919.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119.

August 23, 2018

Julien GoodwinCustom output pods for the Standard Research CG635 Clock Generator

As part of my previously mentioned side project the ability to replace crystal oscillators in a circuit with a higher quality frequency reference is really handy, to let me eliminate a bunch of uncertainty from some test setups.

A simple function generator is the classic way to handle this, although if you need square wave output it quickly gets hard to find options, with arbitrary waveform generators (essentially just DACs) the common option. If you can get away with just sine wave output an RF synthesizer is the other main option.

While researching these I discovered the CG635 Clock Generator from Stanford Research, and some time later picked one of these up used.

As well as being a nice square wave generator at arbitrary voltages these also have another set of outputs on the rear of the unit on an 8p8c (RJ45) connector, in both RS422 (for lower frequencies) and LVDS (full range) formats, as well as some power rails to allow a variety of less common output formats.

All I needed was 1.8v LVCMOS output, and could get that from the front panel output, but I'd then need a coax tail on my boards, as well as potentially running into voltage rail issues so I wanted to use the pod output instead. Unfortunately none of the pods available from Stanford Research do LVCMOS output, so I'd have to make my own, which I did.

The key chip in my custom pod is the TI SN65LVDS4, a 1.8v capable single channel LVDS reciever that operates at the frequencies I need. The only downside is this chip is only available in a single form factor, a 1.5mm x 2mm 10 pin UQFN, which is far too small to hand solder with an iron. The rest of the circuit is just some LED indicators to signal status.


Here's a rendering of the board from KiCad.

Normally "not hand solderable" for me has meant getting the board assembled, however my normal assembly house doesn't offer custom PCB finishes, and I wanted these to have white solder mask with black silkscreen as a nice UX when I go to use them, so instead I decided to try my hand at skillet reflow as it's a nice option given the space I've got in my tiny apartment (the classic tutorial on this from SparkFun is a good read if you're interested). Instead of just a simple plate used for cooking you can now buy hot plates with what are essentially just soldering iron temperature controllers, sold as pre-heaters making it easier to come close to a normal soldering profile.

Sadly, actually acquiring the hot plate turned into a bit of a mess, the first one I ordered in May never turned up, and it wasn't until mid-July that one arrived from a different supplier.

Because of the aforementioned lack of space instead of using stencils I simply hand-applied (leaded) paste, without even an assist tool (which I probably will acquire for next time), then hand-mounted the components, and dropped them on the plate to reflow. I had one resistor turn 90 degrees, and a few bridges from excessive paste, but for a first attempt I was really happy.


Here's a photo of the first two just after being taken off the hot plate.

Once the reflow was complete it was time to start testing, and this was where I ran into my biggest problems.

The two big problems were with the power supply I was using, and with my oscilloscope.

The power supply (A Keithley 228 Voltage/Current source) is from the 80's (Keithley's "BROWN" era), and while it has nice specs, doesn't have the most obvious UI. Somehow I'd set it to limit at 0ma output current, and if you're not looking at the segment lights it's easy to miss. At work I have an EEZ H24005 which also resets the current limit to zero on clear, however it's much more obvious when limiting, and a power supply with that level of UX is now on my "to buy" list.

The issues with my scope were much simpler. Currently I only have an old Rigol DS1052E scope, and while it works fine it is a bit of a pain to use, but ultimately I made a very simple mistake while testing. I was feeding in a trigger signal direct from the CG635's front outputs, and couldn't figure out why the generator was putting out such a high voltage (implausibly so). To cut the story short, I'd simply forgotten that the scope was set for use with 10x probes, and once I realised that everything made rather more sense. An oscilloscope with auto-detection for 10x probes, as well as a bunch of other features I want in a scope (much bigger screen for one), has now been ordered, but won't arrive for a while yet.

Ultimately the boards work fine, but until the new scope arrives I can't determine signal quality of them, but at least they're ready for when I'll need them, which is great for flow.

August 01, 2013

Tim ConnorsNo trains for the corporatocracy

Sigh, look, I know we don't actually live in a democracy (but a corporatocracy instead), and I should never expect the relevant ministers to care about my meek little protestations otherwise, but I keep writing these letters to ministers for transport anyway, under the vague hope that it might remind them that they're ministers for transport, and not just roads.


Dear Transport Minister, Terry Mulder,

I encourage you and your fellow ministers to read this article
("Tracking the cost", The Age, June 13 2009) from 2009, back when the
Liberals claimed to have a very different attitude, and when
circumstances seemed to mirror the current time:
http://www.theage.com.au/national/tracking-the-cost-20090612-c67m.html

The eventual costs of building the first extensions to the Melbourne
public transport system in 80 years eventually blew out from $8M to
$500M over the short life of the South Morang project; despite being a
much smaller project than the entire rail lines built cheaper by
cities such as Perth in recent years.

The increased cost is explained away as a safety requirement - it
being so important to now start building grade separated lines rather
than level crossings regardless of circumstances. Perceived safety
trumps real safety (I'd much rather be in a train than suffer from one
of the 300 Victorian deaths on the roads each year), but more sinister
is that because of this inflated expense, we'll probably never see
another rail line like this built at all in Melbourne (although we'll
build at public expense a wonderful road tunnel that no-one but
Lindsay Fox will use at more than 10 times the cost, though).

I suspect the real reason for grade separation is not safety, but to
cause less inconvenience to car drivers stuck for 30 seconds at these
minor crossings. Since the delays at level crossings are a roads
problem, and collisions of errant motorists with trains at level
crossings is a roads problem, and the South Morang railway reservation
existed far before any of the roads were put in place, I'm wondering
whether you can answer why the blowout in costs of construction of
train lines comes out of the public transport budget, and not at the
expense of what causes these problems in the first place - the roads?
These train lines become harder to build because of an artificial cost
inflation caused by something that will be less of a problem if only
we could built more rail lines and actually improve the Melbourne
public transport system and make it attractive to use, for once (we've
been waiting for 80 years).


Yours sincerely,


And a little while later, the reply!

July 01, 2013

Tim ConnorsYarra trail pontoon closures

I do have to admit, I had some fun writing this one:

Dear Transport Minister, Terry Mulder (Denis Napthine, Local MP Ted Baillieu, Ryan Smith MP responsible for Parks Victoria, Parks Victoria itself, and Bicycle Victoria CCed),

I am writing about the sudden closure of the Main Yarra bicycle trail around Punt Road. The floating sections of the trail have been closed for the foreseeable future because of some over-zealous lawyer at Parks Victoria who has decided that careless riders might injure themselves on the rare occasion when the pontoon is both icy, and resting on the bottom of the Yarra at very low tides, sloping sideways at a minor angle. The trail has been closed before Parks Victoria have even planned for how they're going to rectify the problem with the pontoons. Instead, the lawyers have forced riders to take to parallel streets such as Swan St (which I took tonight in the rain, negotiating the thin strip between parked cars far enough from their doors being flung out illegally by careless drivers, and the wet tram tracks beside them). Obviously, causing riders to take these detours will be very much less safe than just keeping the trail open until a plan is developed, but I can see why Parks Victoria would want to shift the legal burden away from them.

I have no faith that the pontoon will be fixed in the foreseeable future without your intervention, because of past history -- that trail has been partially closed for about 18 months out of the past 3 years due to the very important works on the freeway above (keeping the economy going, as they say, by digging ditches and filling them immediately back up again).

Since we're already wasting $15B on an east-west freeway tunnel that will do absolutely nothing to alleviate traffic congestion because the outbound (Easterly direction) freeway is already at capacity in the afternoon without the extra induced traffic this project will add, I was wondering if you could spare a few million to duplicate the only easterly bicycle trail we have, so that these sorts of incidents don't reoccur and have so much impact on riders in the future.

I do hope that this trail will be fixed in a timely fashion before myself and all other 3000-4000 cyclists who currently use the trail every day resorting to riding through any of your freeway tunnels.

Yours sincerely,

Me

April 14, 2013

Tim Connors

Oh well, if The Age aren't going to publish my Thatcher rant, I will:

Jan White (
Letters, 11 Apr) is heavily misguided if she believes that Thatcher was one of Britain's greatest leaders. For whom? By any metric 70% of Brits cared about, she was one of the worst. Any harmony, strength of character and respect Brits may be missing now would be due to her having nearly destroyed everything about British society with her Thatchernomics. Her funeral should be privatised and definitely not funded by the state as it is going to be. Instead, it could be funded by the long queue of people who want to dance on her grave.

March 21, 2013

Tim ConnorsRagin' on the road

Since The Age didn't publish my letter, my 3 readers ought to see it anyway:


Reynah Tang of the Law Institute of Victoria says that road rage offences shouldn't necessarily lead to loss of licence ("Offenders risk losing their licence", The Age, Mar 21) . He misses the point -- a vehicle is a weapon. Road ragers demonstrably do not have enough self control to drive. They have already lost their temper when in control of such a weapon, so they must never be given a licence to use that weapon again (the weapon should also be forfeited). The same is presumably true of gun murderers after their initial jail time (which road ragers rarely are given). RACV's Brian Negus also doesn't appear to realise that a driving license is a privilege, not an automatic right. You can still have all your necessary mobility without your car - it's not a human rights issue.


It was less than 200 words even dammit! But because the editor didn't check the basic arithmetic in a previous day's letter, they had to publish someone's correction.

November 18, 2012

Ben McGinnesFixed it

I've fixed the horrible errors that were sending my tweets here, it only took a few hours.

To do that I've had to disable cross-posting and it looks like it won't even work manually, so my updates will likely only occur on my own domain.

Details of the changes are here. They include better response times for my domain and no more Twitter posts on the main page, which should please those of you who hate that. Apparently that's a lot of people, but since I hate being inundated with FarceBook's crap I guess it evens out.

The syndicated feed for my site around here somewhere will get everything, but there's only one subscriber to that (last time I checked) and she's smart enough to decide how she wants to deal with that.

Ben McGinnesTweet Sometimes I amaze even myself; I remembered the pa…

Sometimes I amaze even myself; I remembered the passphrases to old PGP keys I thought had been lost to time. #crypto

Originally published at Organised Adversary. Please leave any comments there.

Ben McGinnesTweet These are the same keys I referred to in the PPAU…

These are the same keys I referred to in the PPAU #NatSecInquiry submission as being able to be used against me. #crypto

Originally published at Organised Adversary. Please leave any comments there.

Ben McGinnesTweet Now to give them their last hurrah: sign my curren…

Now to give them their last hurrah: sign my current key with them and then revoke them! #crypto

Originally published at Organised Adversary. Please leave any comments there.

October 26, 2011

Donna Benjaminheritage and hysterics

Originally published at KatteKrab. Please leave any comments there.

This gorgeous photo of The Queen in Melbourne on the Royal Tram made me smile this morning.

I've long been a proponent of an Australian Republic - but the populist hysteria of politicians, this photo, and the Kingdom of the Netherlands is actually making me rethink that position.

At least for today.  Long may she reign over us.

"Queen Elizabeth II smiles as she rides on the royal tram down St Kilda Road"
Photo from Getty Images published on theage.com.au

October 02, 2011

Donna BenjaminSticks and Stones and Speech

Originally published at KatteKrab. Please leave any comments there.

THE law does treat race differently: it is not unlawful to publish an article that insults, offends, humiliates or intimidates old people, for instance, or women, or disabled people. Professor Joseph, director of the Castan Centre for Human Rights Law at Monash University, said in principle ''humiliate and intimidate'' could be extended to other anti-discrimination laws. But historically, racial and religious discrimination is treated more seriously because of the perceived potential for greater public order problems and violence.

Peter Munro The Age  2 Oct 2011

Ahaaa. Now I get it! We've been doing it wrong. 

Racial villification is against the law because it might be more likely to lead to violence than villifying women, the elderly or the disabled.

Interesting debates and articles about free speech and discrimination are bobbing up and down in the flotsam and jetsam of the Bolt decision. Much of it seems to hinge on some kind of legal see-saw around notions of a bad law about bad words.

I've always been a proponent of the sticks and stones philosophy.  For those not familiar, it's the principle behind a children's nursery rhyme.

Sticks and Stones may break my bones
But  words will never hurt me

But I'm increasingly disturbed by the hateful culture of online comment.  I am a very strong proponent of the human right to free expression, and abhor censorship, but I'm seriously sick of "My right to free speech" being used as the ultimate excuse for people using words to denigrate, humiliate, intimidate, belittle and attack others, particularly women.

We should defend a right to free speech, but condemn hate speech when ever and where ever we see it.  Maybe we actually need to get violent to make this stop? Surely not.

September 20, 2011

Donna BenjaminQantas Pilots

Originally published at KatteKrab. Please leave any comments there.

The Qantas Pilot Safety culture is something worth fighting to protect. I read Malcolm Gladwell's Outliers whilst on board a Qantas flight recently. While Qantas itself isn't mentioned in the book, a footnote listed Australia as having the 2nd lowest Pilot Power-Distance Index (PDI) in the world. New Zealand had the lowest. The entire chapter "The Ethnic Theory of Plane Crashes" is the strongest argument I've seen which explains the Qantas safety record. The experience of pilots and relationships amongst the entire air crew is a crucial differentiating factor. Other airlines work hard to develop this culture, often needing to work against their own cultural patterns to achieve it. At Qantas, and likely at other Australian airlines too, this culture is the norm.

I want Australian Qantas Pilots flying Qantas planes. I'd like an Australian in charge too.

If you too support Qantas Pilots - go to their website, sign the petition.

Do your own reading.

G.R. Braithwaite, R.E. Caves, J.P.E. Faulkner, Australian aviation safety — observations from the ‘lucky’ countryJournal of Air Transport Management, Volume 4, Issue 1, January 1998: 55-62.

Anthony Dennis, What it takes to become a Qantas pilot news.com.au, 8 September 2011.

Ashleigh Merritt, Culture in the Cockpit: Do Hofstede’s Dimensions Replicate?  Journal of Cross-Cultural Psychology, May 2000 31: 283-30.

Matt Phillips, Malcolm Gladwell on Culture, Cockpit Communication and Plane Crashes, WSJ Blogs, 4 December 2008.

 

September 18, 2011

Donna BenjaminRegistering for LCA2012

Originally published at KatteKrab. Please leave any comments there.

linux.conf.au ballarat 2012

I am right now, at this very minute, registering for linux.conf.au in Ballarat in January. Creating my planet feed. Yep. Uhuh.

I reckon the "book a bus" feature of rego is pretty damn cool.  I won't be using it, because I'll be driving up from Melbourne. Serious kudos to the Ballarat team. Also nice to see they'll add busses from Avalon airport as well as from Tullamarine airport if there's demand.

Too cool.