Planet Russell


Planet DebianDirk Eddelbuettel: #0: Introducing R^4

So I had been toying with the idea of getting back to the blog and more regularly writing / posting little tips and tricks. I even starting taking some notes but because perfect is always the enemy of the good it never quite materialized.

But the relatively broad discussion spawned by last week's short rant on Suggests != Depends made a few things clear. There appears to be an audience. It doesn't have to be long. And it doesn't have to be too polished.

So with that, let's get the blogging back from micro-blogging.

This note forms post zero of what will a new segment I call R4 which is shorthand for relatively random R rambling.

Stay tuned.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Google AdsenseManage the risks associated with user comments

As a publisher, you can drive discussion and increase reader engagement by using user comments. At their best, comments enable your readers to share their perspectives and learn from each others’ experiences. By creating a community of conversation around your articles, your readers become more engaged and find your site more relevant and beneficial.

Alas, not every commenter is well-intentioned or well-informed. Consequently, comment sections can devolve into a place where social norms are tossed aside to further an agenda or to air a grievance. These negative, rude, or abusive comments take away from the article and ultimately harm your brand. Comments that violate Google policies can also cause your site to no longer be eligible to show Google ads.

So, as a publisher, how can you keep comments — or, more generally, user-generated content (UGC) — policy compliant so that your site can continue to monetize with Google??

First, understand that as a publisher, you are responsible for ensuring that all comments on your site or app comply with all of our applicable program policies on all of the pages where Google ad code appears. This includes comments that are added to your pages by users, which can sometimes contain hate speech or explicit text.

Knowing this, please read Strategies for managing user-generated content. Make sure you understand how to mitigate risk before you enable comments or other forms of user-generated content. Managing comments on your site pages is your responsibility, so make sure you know what you’re getting into. For example, you’ll need to ensure you review and moderate comments consistently so as to ensure policy compliance so that Google ads can run.. We published an infographic in 2016 which offers a quick all-in-one glance at policy compliance.

Another option:
If you are unable to put into place strong and responsive controls over your comments, we strongly encourage you to make a simple design change: put comments on their own page, and don’t run ads on that page. Otherwise, unreviewed and unmoderated offensive or inappropriate user comments can show right next to your publisher content. This can damage your brand, offend your users, and cause you to violate Google policies.

Here’s one way to separate comments and content:
At the end of your content, place a call to action, such as: “User Comments” or “View Comments” which lets users open the comments in a new page. On that new page, make sure not to place any Google ad tags, so that no ads serve next to those comments...

At Google, we believe in fostering an environment where users, advertisers, and publishers can all thrive in a healthy digital advertising ecosystem. By valuing each party equally, we help ensure the sustainability of our industry. We publish Help Center materials, write blog posts, speak at industry events, provide publisher forums and host events at our offices to help our publishers succeed in an ever changing environment. 
Posted by: John Brown, Head of Publisher Policy Communications

CryptogramThe TSA's Selective Laptop Ban

Last Monday, the TSA announced a peculiar new security measure to take effect within 96 hours. Passengers flying into the US on foreign airlines from eight Muslim countries would be prohibited from carrying aboard any electronics larger than a smartphone. They would have to be checked and put into the cargo hold. And now the UK is following suit.

It's difficult to make sense of this as a security measure, particularly at a time when many people question the veracity of government orders, but other explanations are either unsatisfying or damning.

So let's look at the security aspects of this first. Laptop computers aren't inherently dangerous, but they're convenient carrying boxes. This is why, in the past, TSA officials have demanded passengers turn their laptops on: to confirm that they're actually laptops and not laptop cases emptied of their electronics and then filled with explosives.

Forcing a would-be bomber to put larger laptops in the plane's hold is a reasonable defense against this threat, because it increases the complexity of the plot. Both the shoe-bomber Richard Reid and the underwear bomber Umar Farouk Abdulmutallab carried crude bombs aboard their planes with the plan to set them off manually once aloft. Setting off a bomb in checked baggage is more work, which is why we don't see more midair explosions like Pan Am Flight 103 over Lockerbie, Scotland, in 1988.

Security measures that restrict what passengers can carry onto planes are not unprecedented either. Airport security regularly responds to both actual attacks and intelligence regarding future attacks. After the liquid bombers were captured in 2006, the British banned all carry-on luggage except passports and wallets. I remember talking with a friend who traveled home from London with his daughters in those early weeks of the ban. They reported that airport security officials confiscated every tube of lip balm they tried to hide.

Similarly, the US started checking shoes after Reid, installed full-body scanners after Abdulmutallab and restricted liquids in 2006. But all of those measure were global, and most lessened in severity as the threat diminished.

This current restriction implies some specific intelligence of a laptop-based plot and a temporary ban to address it. However, if that's the case, why only certain non-US carriers? And why only certain airports? Terrorists are smart enough to put a laptop bomb in checked baggage from the Middle East to Europe and then carry it on from Europe to the US.

Why not require passengers to turn their laptops on as they go through security? That would be a more effective security measure than forcing them to check them in their luggage. And lastly, why is there a delay between the ban being announced and it taking effect?

Even more confusing, the New York Times reported that "officials called the directive an attempt to address gaps in foreign airport security, and said it was not based on any specific or credible threat of an imminent attack." The Department of Homeland Security FAQ page makes this general statement, "Yes, intelligence is one aspect of every security-related decision," but doesn't provide a specific security threat. And yet a report from the UK states the ban "follows the receipt of specific intelligence reports."

Of course, the details are all classified, which leaves all of us security experts scratching our heads. On the face of it, the ban makes little sense.

One analysis painted this as a protectionist measure targeted at the heavily subsidized Middle Eastern airlines by hitting them where it hurts the most: high-paying business class travelers who need their laptops with them on planes to get work done. That reasoning makes more sense than any security-related explanation, but doesn't explain why the British extended the ban to UK carriers as well. Or why this measure won't backfire when those Middle Eastern countries turn around and ban laptops on American carriers in retaliation. And one aviation official told CNN that an intelligence official informed him it was not a "political move."

In the end, national security measures based on secret information require us to trust the government. That trust is at historic low levels right now, so people both in the US and other countries are rightly skeptical of the official unsatisfying explanations. The new laptop ban highlights this mistrust.

This essay previously appeared on

EDITED TO ADD: Here are two essays that look at the possible political motivations, and fallout, of this ban. And the EFF rightly points out that letting a laptop out of your hands and sight is itself a security risk -- for the passenger.

Sociological Images#ThanksForTyping – Notes of Gratitude and the History of Women’s Anonymity in Knowledge Production

Knowledge production is a collective endeavor. Individuals get named as authors of studies and on the covers of books and journal articles. But little knowledge is produced in such a vacuum that it can actually be attributed to only those whose names are associated with the final product. Bruce Holsinger, a literary scholar at the University of Virginia, came up with an interesting way of calling attention to some of women’s invisible labor in this process–typing their husbands’ manuscripts.

Holsinger noted a collection of notes written by husbands to their wives thanking them for typing the entirety of their manuscripts (dissertations, books, articles, etc.), but not actually explicitly naming them in the acknowledgement. It started with five tweets and a hashtag: #ThanksForTyping.

Typing a manuscript is a tremendous task – particularly when revisions require re-typing everything (typewriters, not computers). And, though they are thanked here, it’s a paltry bit of gratitude when you compare it with the task for which they are being acknowledged. They’re anonymous, their labor is invisible, but they are responsible for the transmitting men’s scholarship into words.

Needless to say, the hashtag prompted a search that uncovered some of the worst offenders. The acknowledgements all share a few things in common: they are directed at wives, do not name them (though often name and thank others alongside), and they are thanked for this enormous task (and sometimes a collection of others along with it). Here are a few of the worst offenders:

Indeed, typing was one of those tasks for which women were granted access to and in which women were offered formal training. Though, some of these are notes of gratitude to wives who have received education far beyond typing. And many of the acknowledgements above hint that more than mere transcription was often offered – these unnamed women were also offering ideas, playing critical roles in one of the most challenging elements of scientific inquiry and discovery – presenting just what has been discovered and why it matters.

One user on twitter suggested examining it in Google’s ngram tool to see how often “thanks to my wife who,” “thanks to my wife for” and the equivalents adding “husband” have appeared in books. The use of each phrase doesn’t mean the women were not named, but it follows what appears to be a standard practice in many of the examples above – the norm of thanking your wife for typing your work, but not naming her in the process.

Of course, these are only examples of anonymous women contributing to knowledge production through typing. Women’s contributions toward all manner of social, cultural, political, and economic life have been systemically erased, under-credited, or made anonymous.  Each year Mother Jones shares a list of things invented by women for which men received credit (here’s last year’s list).

Knowledge requires work to be produced. Books don’t fall out of people’s heads ready-formed. And the organization of new ideas into written form is treated as a perfunctory task in many of the acknowledgements above–menial labor that people with “more important” things to do ought to avoid if they can. The anonymous notes of gratitude perform a kind of “work” for these authors beyond expressing thanks for an arduous task–these notes also help frame that work as less important than it often is.

Tristan Bridges, PhD is a professor at The College at Brockport, SUNY. He is the co-editor of Exploring Masculinities: Identity, Inequality, Inequality, and Change with C.J. Pascoe and studies gender and sexual identity and inequality. You can follow him on Twitter here. Tristan also blogs regularly at Inequality by (Interior) Design.

(View original at

Worse Than FailureCodeSOD: The Refactoring

I have certain mantras that I use to guide my programming. They generally revolve around this theme: "Thinking is hard, and I'm not very good at it; every block of code should be simple and obvious, because if it makes me think, I'll probably screw it up and break something." It's a good rule for me, and a good general guideline, but it's a little vague to implement as a policy.

Erika’s company wanted to implement this idea as a policy, so they set a limit on how many lines could be in a single method. The thinking was that if each method was short- say, under 100 lines- it would automatically be simple(r), right?

Well, Garret, down the hall, wrote a method that was three hundred lines long. During a code review, he was told to refactor it to simplify the logic and comply with the policy. So he did.

public void Update()
[Advertisement] Release! is a light card game about software and the people who make it. Play with 2-5 people, or up to 10 with two copies - only $9.95 shipped!

Planet DebianElena 'valhalla' Grandi: New pajama

New pajama

I may have been sewing myself a new pajama.


It was plagued with issues; one of the sleeve is wrong side out and I only realized it when everything was almost done (luckily the pattern is symmetric and it is barely noticeable) and the swirl moved while I was sewing it on (and the sewing machine got stuck multiple times: next time I'm using interfacing, full stop.), and it's a bit deformed, but it's done.

For the swirl, I used Inkscape to Simplify (Ctrl-L) the original Debian Swirl a few times, removed the isolated bits, adjusted some spline nodes by hand and printed on paper. I've then cut, used water soluble glue to attach it to the wrong side of a scrap of red fabric, cut the fabric, removed the paper and then pinned and sewed the fabric on the pajama top.
As mentioned above, the next time I'm doing something like this, some interfacing will be involved somewhere, to keep me sane and the sewing machine happy.

Blogging, because it is somewhat relevant to Free Software :) and there are even sources, under a DFSG-Free license :)


Planet DebianDirk Eddelbuettel: RcppTOML 0.1.2

A new release of RcppTOML is now on CRAN. This release fixes a few parsing issues for less frequently-used inputs: vectors of boolean or date(time) types, as well as table array input.

RcppTOML brings TOML to R. TOML is a file format that is most suitable for configurations, as it is meant to be edited by humans but read by computers. It emphasizes strong readability for humans while at the same time supporting strong typing as well as immediate and clear error reports. On small typos you get parse errors, rather than silently corrupted garbage. Much preferable to any and all of XML, JSON or YAML -- though sadly these may be too ubiquitous now. TOML is making good inroads with newer and more flexible projects such as the Hugo static blog compiler, or the Cargo system of Crates (aka "packages") for the Rust language.

Changes in version 0.1.2 (2017-03-26)

  • Dates and Datetimes in arrays in the input now preserve their types instead of converting to numeric vectors (#13)

  • Boolean vectors are also correctly handled (#14)

  • TableArray types are now stored as lists in a single named list (#15)

  • The file was expanded with an example and screenshot.

  • Added file init.c with calls to R_registerRoutines() and R_useDynamicSymbols(); also use .registration=TRUE in useDynLib in NAMESPACE

  • Two example files were updated.

Courtesy of CRANberries, there is a diffstat report for this release.

More information is on the RcppTOML page page. Issues and bugreports should go to the GitHub issue tracker.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Harald WelteOsmoCon 2017 Updates: Travel Grants and Schedule


April 21st is approaching fast, so here some updates. I'm particularly happy that we now have travel grants available. So if the travel expenses were preventing you from attending so far: This excuse is no longer valid!

Get your ticket now, before it is too late. There's a limited number of seats available.

OsmoCon 2017 Schedule

The list of talks for OsmoCon 2017 has been available for quite some weeks, but today we finally published the first actual schedule.

As you can see, the day is fully packed with talks about Osmocom cellular infrastructure projects. We had to cut some talk slots short (30min instead of 45min), but I'm confident that it is good to cover a wider range of topics, while at the same time avoiding fragmenting the audience with multiple tracks.

OsmoCon 2017 Travel Grants

We are happy to announce that we have received donations to permit for providing travel grants!

This means that any attendee who is otherwise not able to cover their travel to OsmoCon 2017 (e.g. because their interest in Osmocom is not related to their work, or because their employer doesn't pay the travel expenses) can now apply for such a travel grant.

For more details see OsmoCon 2017 Travel Grants and/or contact

OsmoCon 2017 Social Event

Tech Talks are nice and fine, but what many people enjoy even more at conferences is the informal networking combined with good food. For this, we have the social event at night, which is open to all attendees.

See more details about it at OsmoCon 2017 Social Event.

Planet Linux AustraliaDavid Rowe: AMBE+2 and MELPe 600 Compared to Codec 2

Yesterday I was chatting on the #freedv IRC channel, and a good question was asked: how close is Codec 2 to AMBE+2 ? Turns out – reasonably close. I also discovered, much to my surprise, that Codec 2 700C is better than MELPe 600!


Original AMBE+2 3000 AMBE+ 2400 Codec 2 3200 Codec 2 2400
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Listen Listen Listen Listen Listen
Original MELPe 600 Codec 2 700C
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen
Listen Listen Listen

Here are all the samples in one big tar ball.


I don’t have a AMBE or MELPe codec handy so I used the samples from the DVSI and DSP Innovations web sites. I passed the original “DAMA” speech samples found on these sites through Codec 2 (codec2-dev SVN revision 3053) at various bit rates. Turns out the DAMA samples were the same for the AMBE and MELPe samples which was handy.

These particular samples are “kind” to codecs – I consistently get good results with them when I test with Codec 2. I’m guessing they also allow other codecs to be favorably demonstrated. During Codec 2 development I make a point of using “pathological” samples such as hts1a, cg_ref, kristoff, mmt1 that tend to break Codec 2. Some samples of AMBE and MELP using my samples on the Codec 2 page.

I usually listen to samples through a laptop speaker, as I figure it’s close to the “use case” of a PTT radio. Small speakers do mask codec artifacts, making them sound better. I also tried a powered loud speaker with the samples above. Through the loudspeaker I can hear AMBE reproducing the pitch fundamental – a bass note that can be heard on some males (e.g. 7), whereas Codec 2 is filtering that out.

I feel AMBE is a little better, Codec 2 is a bit clicky or impulsive (e.g. on sample 1). However it’s not far behind. In a digital radio application, with a small speaker and some acoustic noise about – I feel the casual listener wouldn’t discern much difference. Try replaying these samples through your smart-phone’s browser at an airport and let me know if you can tell them apart!

On the other hand, I think Codec 2 700C sounds better than MELPe 600. Codec 2 700C is more natural. To my ear MELPe has very coarse quantisation of the pitch, hence the “Mr Roboto” sing-song pitch jumps. The 700C level is a bit low, an artifact/bug to do with the post filter. Must fix that some time. As a bonus Codec 2 700C also has lower algorithmic delay, around 40ms compared to MELPe 600’s 90ms.

Curiously, Codec 2 uses just 1 voicing bit which means either voiced or unvoiced excitation in each frame. xMBE’s claim to fame (and indeed MELP) over simpler vocoders is the use of mixed excitation. Some of the spectrum is voiced (regular pitch harmonics), some unvoiced (noise like). This suggests the benefits of mixed excitation need to be re-examined.

I haven’t finished developing Codec 2. In particular Codec 2 700C is very much a “first pass”. We’ve had a big breakthrough this year with 700C and development will continue, with benefits trickling up to other modes.

However the 1300, 2400, 3200 modes have been stable for years and will continue to be supported.

Next Steps

Here is the blog post that kicked off Codec 2 – way back in 2009. Here is a video of my 2012 Codec 2 talk that explains the motivations, IP issues around codecs, and a little about how Codec 2 works (slides here).

What I spoke about then is still true. Codec patents and license fees are a useless tax on business and stifle innovation. Proprietary codecs borrow as much as 95% of their algorithms from the public domain – which are then sold back to you. I have shown that open source codecs can meet and even exceed the performance of closed source codecs.

Wikipedia suggests that AMBE license fees range from USD$100k to USD$1M. For “one license fee” we can improve Codec 2 so it matches AMBE+2 in quality at 2400 and 3000 bit/s. The results will be released under the LGPL for anyone to use, modify, improve, and inspect at zero cost. Forever.

Maybe we should crowd source such a project?

Command Lines

This is how I generated the Codec 2 wave files:

~/codec2-dev/build_linux//src/c2enc 3200 9.wav - | ~/codec2-dev/build_linux/src/c2dec 3200 - - | sox -t raw -r 8000 -s -2 - 9_codec2_3200.wav


DVSI AMBE sample page

DSP Innovations, MELPe samples. Can anyone provide me with TWELP samples from these guys? I couldn’t find any on the web that includes the input, uncoded source samples.

Planet Linux AustraliaOpenSTEM: Trying an OpenSTEM unit without a subscription

We have received quite a few requests for this option, so we’ve made it possible. As we understand it, in many cases an individual teacher wants to try our materials (often on behalf of the school, as a trial) but the teacher has to fund this from their classroom budget, so we appreciate they need to limit their initial outlay.

While purchasing units with an active subscription still works out cheaper (we haven’t changed that pricing), we have tweaked our online store to now also allow the purchase of individual unit bundles, from as little as $49.50 (inc.GST) for the Understanding Our World™ HASS+Science program units. That’s a complete term bundle with teacher handbook, student workbook, assessment guide, model answers and curriculum mapping, as well as all the base resource PDFs needed for that unit! After purchase, the PDF materials can be downloaded from the site (optionally many files together in a ZIP).

We’d love to welcome you as a new customer! From experience we know that you’ll love our materials. The exact pricing difference (between subscription and non-subscription) depends on the type of bundle (term unit, year bundle, or multi-year bundle) and is indicated per item.

Try OpenSTEM today! Browse our teacher unit bundles (Foundation Year to Year 6).

This includes units for Digital Technologies, the Ginger Beer Science project, as well as for our popular Understanding Our World™ HASS+Science program.


Planet DebianDirk Eddelbuettel: RApiDatetime 0.0.2

Two days after the initial 0.0.1 release, a new version of RApiDatetime has just arrived on CRAN.

RApiDatetime provides six entry points for C-level functions of the R API for Date and Datetime calculations. The functions asPOSIXlt and asPOSIXct convert between long and compact datetime representation, formatPOSIXlt and Rstrptime convert to and from character strings, and POSIXlt2D and D2POSIXlt convert between Date and POSIXlt datetime. These six functions are all fairly essential and useful, but not one of them was previously exported by R.

Josh Ulrich took one hard look at the package -- and added the one line we needed to enable the Windows support that was missing in the initial release. We now build on all platforms supported by R and CRAN. Otherwise, I just added a NEWS file and called it a bugfix release.

Changes in RApiDatetime version 0.0.2 (2017-03-25)

  • Windows support has added (Josh Ulrich in #1)

Changes in RApiDatetime version 0.0.1 (2017-03-23)

  • Initial release with six accessible functions

Courtesy of CRANberries, there is a comparison to the previous release. More information is on the rapidatetime page.

For questions or comments please use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Planet DebianBits from Debian: Debian Project Leader elections 2017

It's that time of year again for the Debian Project: the elections of its Project Leader!

The Project Leader position is described in the Debian Constitution.

Two Debian Developers run this year to become Project Leader: Mehdi Dogguy, who has held the office for the last year, and Chris Lamb.

We are in the middle of the campaigning period that will last until the end of April 1st. The candidates and Debian contributors are already engaging in debates and discussions on the debian-vote mailing list.

The voting period starts on April 2nd, and during the following two weeks, Debian Developers can vote to choose the person that will fit that role for one year.

The results will be published on April 16th with the term for new the project leader starting the following day.

Planet DebianRuss Allbery: Spring haul

Work has been hellishly busy lately, so that's pretty much all I've been doing. The major project I'm working on should be basically done in the next couple of weeks, though (fingers crossed), so maybe I'll be able to surface a bit more after that.

In the meantime, I'm still acquiring books I don't have time to read, since that's my life. In this case, two great Humble Book Bundles were too good of a bargain to pass up. There are a bunch of books in here that I already own in paperback (and hence showed up in previous haul posts), but I'm running low on shelf room, so some of those paper copies may go to the used bookstore to make more space.

Kelley Armstrong — Lost Souls (sff)
Clive Barker — Tortured Souls (horror)
Jim Butcher — Working for Bigfoot (sff collection)
Octavia E. Butler — Parable of the Sower (sff)
Octavia E. Butler — Parable of the Talents (sff)
Octavia E. Butler — Unexpected Stories (sff collection)
Octavia E. Butler — Wild Seed (sff)
Jacqueline Carey — One Hundred Ablutions (sff)
Richard Chizmar — A Long December (sff collection)
Jo Clayton — Skeen's Leap (sff)
Kate Elliot — Jaran (sff)
Harlan Ellison — Can & Can'tankerous (sff collection)
Diana Pharoh Francis — Path of Fate (sff)
Mira Grant — Final Girls (sff)
Elizabeth Hand — Black Light (sff)
Elizabeth Hand — Saffron & Brimstone (sff collection)
Elizabeth Hand — Wylding Hall (sff)
Kevin Hearne — The Purloined Poodle (sff)
Nalo Hopkinson — Skin Folk (sff)
Katherine Kurtz — Camber of Culdi (sff)
Katherine Kurtz — Lammas Night (sff)
Joe R. Lansdale — Fender Lizards (mainstream)
Robert McCammon — The Border (sff)
Robin McKinley — Beauty (sff)
Robin McKinley — The Hero and the Crown (sff)
Robin McKinley — Sunshine (sff)
Tim Powers — Down and Out in Purgatory (sff)
Cherie Priest — Jacaranda (sff)
Alastair Reynolds — Deep Navigation (sff collection)
Pamela Sargent — The Shore of Women (sff)
John Scalzi — Miniatures (sff collection)
Lewis Shiner — Glimpses (sff)
Angie Thomas — The Hate U Give (mainstream)
Catherynne M. Valente — The Bread We Eat in Dreams (sff collection)
Connie Willis — The Winds of Marble Arch (sff collection)
M.K. Wren — Sword of the Lamb (sff)
M.K. Wren — Shadow of the Swan (sff)
M.K. Wren — House of the Wolf (sff)
Jane Yolen — Sister Light, Sister Dark (sff)

Planet DebianEddy Petrișor: LVM: Converting root partition from linear to raid1 leads to boot failure... and how to recover

I have a system which has 3 distinct HDDs used as physucal volumes for Linux LVM. One logical volume is the root partition and it was initally created as a linear LV (vg0/OS).
Since I have PV redundancy, I thought it might be a good idea to convert the root LV from liear to raid1 with 2 mirrors.

WARNING: It seems LVM raid1 logicalvolume for / is not supported with grub2, at least not with Ubuntu's 2.02~beta2-9ubuntu1.6 (14.04LTS) or Debian Jessie's grub-pc 2.02~beta2-22+deb8u1!

So I did this:
lvconvert -m2 --type raid1 vg0/OS

Then I restarted to find myself at the 'grub rescue>' prompt.

The initial problem was seen on an Ubuntu 14.04 LTS (aka trusty) system, but I reproduced it on a VM with Debian Jessie.

I downloaded the Super Grub2 Disk and tried to boot the VM. After choosing the option to load the LVM and RAID support, I was able to boot my previous system.

I tried several times to reinstall GRUB, thinking that was the issue, but I always got this  kind of error:

/usr/sbin/grub-probe: error: disk `lvmid/QtJiw0-wsDf-A2zh-2v2y-7JVA-NhPQ-TfjQlN/phCDlj-1XAM-VZnl-RzRy-g3kf-eeUB-dBcgmb' not found.

In the end, after digging for more than 4 hours for answers,  I decided I might be able to revert the config to linear configuration, from the (initramfs) prompt.

Initally the LV was inactive, so I activated it:

lvchange -a y /dev/vg0/OS

Then restored the LV to linear:

lvconvert -m0 vg0/OS

Then tried to reboot without reinstalling GRUB, just for kicks, which succeded.

In order to confirm this was the issue, I redid the whole thing, and indeed, with a raid1 root, I always got the error lvmid error.

I'll have to check on Monday at work if I can revert it the same way the Ubuntu 14.04 system, but I suspect I will have no issues.

Is it true root on lvm-raid1 is nto supported?

CryptogramCommenting Policy for This Blog

Over the past few months, I have been watching my blog comments decline in civility. I blame it in part on the contentious US election and its aftermath. It's also a consequence of not requiring visitors to register in order to post comments, and of our tolerance for impassioned conversation. Whatever the causes, I'm tired of it. Partisan nastiness is driving away visitors who might otherwise have valuable insights to offer.

I have been engaging in more active comment moderation. What that means is that I have been quicker to delete posts that are rude, insulting, or off-topic. This is my blog. I consider the comments section as analogous to a gathering at my home. It's not a town square. Everyone is expected to be polite and respectful, and if you're an unpleasant guest, I'm going to ask you to leave. Your freedom of speech does not compel me to publish your words.

I like people who disagree with me. I like debate. I even like arguments. But I expect everyone to behave as if they've been invited into my home.

I realize that I sometimes express opinions on political matters; I find they are relevant to security at all levels. On those posts, I welcome on-topic comments regarding those opinions. I don't welcome people pissing and moaning about the fact that I've expressed my opinion on something other than security technology. As I said, it's my blog.

So, please... Assume good faith. Be polite. Minimize profanity. Argue facts, not personalities. Stay on topic. If you want a model to emulate, look at Clive Robinson's posts.

Schneier on Security is not a professional operation. There's no advertising, so no revenue to hire staff. My part-time moderator -- paid out of my own pocket -- and I do what we can when we can. If you see a comment that's spam, or off-topic, or an ad hominem attack, flag it and be patient. Don't reply or engage; we'll get to it. And we won't always post an explanation when we delete something.

My own stance on privacy and anonymity means that I'm not going to require commenters to register a name or e-mail address, so that isn't an option. And I really don't want to disable comments.

I dislike having to deal with this problem. I've been proud and happy to see how interesting and useful the comments section has been all these years. I've watched many blogs and discussion groups descend into toxicity as a result of trolls and drive-by ideologues derailing the conversations of regular posters. I'm not going to let that happen here.

CryptogramSecond WikiLeaks Dump of CIA Documents

There are more CIA documents up on WikiLeaks. It seems to be mostly MacOS and iOS -- including exploits that are installed on the hardware before they're delivered to the customer.

News articles.

EDITED TO ADD (3/25): Apple claims that the vulnerabilities are all fixed. Note that there are almost certainly other Apple vulnerabilities in the documents still to be released.

Planet DebianUrvika Gola: Speaking at FOSSASIA’17 | Seasons of Debian : Summer of Code & Winter of Outreachy

I got an amazing chance to speak at FOSSASIA 2017 held at Singapore on “Seasons of Debian – Summer of Code and Winter of Outreachy“. I gave a combined talk with my co-speaker Pranav Jain, who contributed to Debian through GSoC. We talked about two major open source initiatives – Outreachy and Google Summer of Code and the work we did on a common project – Lumicall under Debian.

WhatsApp Image 2017-03-23 at 11.32.33 PM

The excitement started even before the first day! On 16th March, there was a speakers meetup at Microsoft office in Singapore. There, I got the chance to connect with other speakers and learn about their work The meetup was concluded by Microsoft Office tour! As a student it was very exciting to see first hand the office of a company that I had only dreamt of being at.

On 17th March, i.e the first day of the three days long conference, I met Hong Phuc Dang, Founder of FOSSASIA. She is very kind and just talking to her just made me cheerful!
Meeting so many great developers from different organizations was exciting.

On 18th March, was the day of our talk!  I was a bit nervous to speak in front of amazing developers but, that’s how you grow 🙂 Our talk was preceded by a lovely introduction by Mario Behling.

WhatsApp Image 2017-03-25 at 2.32.01 PM.jpeg


I talked about how Outreachy Programme has made a significant impact in increasing the participation of women in Open Source, with one such woman being me

I also talked about Android Programming concepts which I used in while adding new features into Lumicall. Pranav talked about Debian Organization and how to get started with GSoC by sharing his journey!

After our talk, students approached us asking questions about how to participate in Outreachy and GsOC. I felt that a lot more students were receptive to knowing about this new opportunity.

Our own talk was part of the mini DebConf track. Under this track, there were two other amazing sessions namely, “Debian – The best Linux distribution” and “Open Build Service in Debian”.

The variety of experiences I gained from FOSSASIA was very diverse. I  learned how to speak at a huge platform, learned from other interesting talks, share ideas with smart developers and saw an exciting venue and wonderful city!

I would not be able to experience this without the continuous support of Debian and Outreachy  ! 🙂




TEDA new civic gathering, awarding disobedience, and the case for resettlement

As usual, the TED community has lots of news to share this week. Below, some highlights.

A new civic gathering. To cope with political anxiety after the 2016 elections, Eric Liu has started a gathering called Civic Saturday. He explained the event in The Atlantic as “a civic analogue to church: a gathering of friends and strangers in a common place to nurture a spirit of shared purpose. But it’s not about church religion or synagogue or mosque religion. It’s about American civic religion—the creed of liberty, equality, and self-government that truly unites us.” The gatherings include quiet meditation, song, readings of civic texts, and yes, a sermon. The next Civic Saturday happens April 8 in Seattle — and Eric’s nonprofit Citizens University encourages you to start your own. (Watch Eric’s TED Talk)

Medical research facilitated by apps. The Scripps Translational Science Institute is teaming up with WebMD for a comprehensive study of pregnancy using the WebMD pregnancy app.  By asking users to complete surveys and provide data on their pregnancy, the study will shed light on “one of the least studied populations in medical research,” says STSI director Dr. Eric Topol. The researchers hope the results will provide insights that medical professionals can use to avoid pregnancy complications. (Watch Eric’s TED Talk)

There’s a new type of cloud! While cloud enthusiasts have documented the existence of a peculiar, wave-like cloud formation for years, there’s been no official recognition of it until now. Back in 2009, Gavin Pretor-Pinney, of the Cloud Appreciation Society, proposed to the World Meteorological Society that they add the formation to the International Cloud Atlas, the definitive encyclopedia of clouds, which hadn’t been updated since 1987. On March 24, the Meteorological Society released an updated version of the Atlas, complete with an entry for the type of cloud that Pretor-Pinney had proposed adding. The cloud was named asperitas, meaning “roughness.” (Watch Gavin’s TED Talk)

What neuroscience can teach law. Criminal statutes require juries to assess whether or not the defendant was aware that they were committing a crime, but a jury’s ability to accurately determine the defendant’s mental state at the time of the crime is fraught with problems. Enter neuroscience. Read Montague and colleagues are using neuroimaging and machine learning techniques to study if and how brain activity differs for the two mental states. The research is in early stages, but continued research may help shed scientific light on a legally determined boundary. (Watch Read’s TED Talk)

Why we should award disobedience. After announcing the $250,000 prize last summer, the MIT Media Lab has begun to accept nominations for its first-ever Disobedience Award. Open to groups and individuals engaged in an extraordinary example of constructive disobedience, the prize honors work that undermines traditional structures and institutions in a positive way, from politics and science to advocacy and art. “You don’t change the world by doing what you’re told,” Joi Ito notes, a lesson that has been a long-held practice for the MIT group, who also recently launched their own initiative for space exploration. Nominations for the award are open now through May 1. (Watch Joi’s TED Talk)

The next generation of biotech entrepreneurs. The Innovative Genomics Institute, led by Jennifer Doudna, announced the winners of its inaugural Entrepreneurial Fellowships. Targeted at early-career scientists, the fellowship provides research funding plus business training and mentorship, an entrepreneurial focus that helps scientists create practical impact through commercialization of their work. “I’ve seen brilliant ideas that fizzle out because startup companies just can’t break into the competitive biotechnology scene,” Doudna says. “With more time to develop their ideas and technology, our fellows will have the head start needed to earn the confidence of investors.” (Watch Jennifer’s TED Talk)

The case for resettlement. Since the 1980s, the dominant international approach for the resettlement of refugees has been the humanitarian silo, a camp often located in countries that border war zones. But such host countries are often ill-equipped to bear the brunt. Indeed, many countries place severe restrictions on refugee participation within their communities and labor markets, creating what Alexander Betts describes in The Guardian as an indefinite, even unavoidable, dependency on aid. In this thought-provoking excerpt of his co-authored book, Betts outlines an economic argument for refugee resettlement, arguing that “refugees need to be understood as much in terms of development and trade as humanitarianism.” (Watch Alexander’s TED Talk)

Have a news item to share? Write us at and you may see it included in this weekly round-up.

CryptogramFriday Squid Blogging: Squid from Utensils

Available on eBay.

As usual, you can also use this squid post to talk about the security stories in the news that I haven't covered.

Planet DebianGunnar Wolf: Dear lazyweb: How would you visualize..?

Dear lazyweb,

I am trying to get a good way to present the categorization of several cases studied with a fitting graph. I am rating several vulnerabilities / failures according to James Cebula et. al.'s paper, A taxonomy of Operational Cyber Security Risks; this is a somewhat deep taxonomy, with 57 end items, but organized in a three levels deep hierarchy. Copying a table from the cited paper (click to display it full-sized):

My categorization is binary: I care only whether it falls within a given category or not. My first stab at this was to represent each case using a star or radar graph. As an example:

As you can see, to a "bare" star graph, I added a background color for each top-level category (blue for actions of people, green for systems and technology failures), red for failed internal processes and gray for external events), and printed out only the labels for the second level categories; for an accurate reading of the graphs, you have to refer to the table and count bars. And, yes, according to the Engineering Statistics Handbook:

Star plots are helpful for small-to-moderate-sized multivariate data sets. Their primary weakness is that their effectiveness is limited to data sets with less than a few hundred points. After that, they tend to be overwhelming.

I strongly agree with the above statement — And stating that "a few hundred points" can be understood is even an overstatement. 50 points are just too much. Now, trying to increase usability for this graph, I came across the Sunburst diagram. One of the proponents for this diagram, John Stasko, has written quite a bit about it.

Now... How to create my beautiful Sunburst diagram? That's a tougher one. Even though the page I linked to in the (great!) Data visualization catalogue presents even some free-as-in-software tools to do this... They are Javascript projects that will render their beautiful plots (even including an animation)... To the browser. I need them for a static (i.e. to be printed) document. Yes, I can screenshot and all, but I want them to be automatically generated, so I can review and regenerate them all automatically. Oh, I could just write JSON and use SaaS sites such as Aculocity to do the heavy-lifting, but if you know me, you will understand why I don't want to.

So... I set out to find a Gunnar-approved way to display the information I need. Now, as the Protovis documentation says, an icicle is simply a sunburst transformed from polar to cartesian coordinates... But I came to a similar conclusion: The tools I found are not what I need. OK, but an icicle graph seems much simpler to produce — I fired up my Emacs, and started writing using Ruby, RMagick and RVG... I decided to try a different way. This is my result so far:

So... What do you think? Does this look right to you? Clearer than the previous one? Worst? Do you have any idea on how I could make this better?

Oh... You want to tell me there is something odd about it? Well, yes, of course! I still need to tweak it quite a bit. Would you believe me if I told you this is not really a left-to-right icicle graph, but rather a strangely formatted Graphviz non-directed graph using the dot formatter?

I can assure you you don't want to look at my Graphviz sources... But in case you insist... Take them and laugh. Or cry. Of course, this file comes from a hand-crafted template, but has some autogenerated bits to it. I have still to tweak it quite a bit to correct several of its usability shortcomings, but at least it looks somewhat like what I want to achieve.

Anyway, I started out by making a "dear lazyweb" question. So, here it goes: Do you think I'm using the right visualization for my data? Do you have any better suggestions, either of a graph or of a graph-generating tool?


[update] Thanks for the first pointer, Lazyweb! I found a beautiful solution; we will see if it is what I need or not (it is too space-greedy to be readable... But I will check it out more thoroughly). It lays out much better than anything I can spew out by myself — Writing it as a mindmap using TikZ directly from within LaTeX, I get the following result:

Rondam RamblingsHard to say which is worse

I'm not sure which circumstance is the more disturbing, the fact that my health insurance is hanging by the thinnest of threads, or the fact that the only reason I have even that faint hope to cling to is that the freedom caucus doesn't think the AHCA bill is horrible enough.  They want to chip away the requirements that insurance plans provide comprehensive coverage, thereby fragmenting (and

Krebs on SecurityPhishing 101 at the School of Hard Knocks

A recent, massive spike in sophisticated and successful phishing attacks is prompting many universities to speed up timetables for deploying mandatory two-factor authentication (2FA) — requiring a one-time code in addition to a password — for access to student and faculty services online. This is the story of one university that accelerated plans to require 2FA after witnessing nearly twice as many phishing victims in the first two-and-half months of this year than it saw in all of 2015.

bgBowling Green State University in Ohio has more than 20,000 students and faculty, and like virtually any other mid-sized state school its Internet users are constantly under attack from scammers trying to phish login credentials for email and online services.

BGSU had planned later this summer to make 2FA mandatory for access to the school’s portal — the primary place where students register for classes, pay bills, and otherwise manage their financial relationship to the university.

That is, until a surge in successful phishing attacks resulted in several students having bank accounts and W-2 tax forms siphoned.

On March 1, 2017 all BGSU account holders were required to change their passwords, and on March 15, 2017 two-factor authentication (Duo) protection was placed in front of the MyBGSU portal [full disclosure: Duo is a longtime advertiser on KrebsOnSecurity].

Matt Haschak, director of IT security and infrastructure at BGSU, said the number of compromised accounts detected at BGSU has risen from 250 in calendar year 2015 to 1000 in 2016, and to approximately 400 in the first 75 days of 2017.

Left unchecked, phishers are on track to steal credentials from nearly 10 percent of the BGSU student body by the end of this year. The university has offered 2FA options for its portal access since June 2016, but until this month few students or faculty were using it, Haschak said.

“We saw very low adoption when it was voluntary,” he said. “And typically the people who adopted it were not my big security risks.”

Haschak said it’s clear that the scale and size of the phishing problem is hardly unique to BGSU.

“As I keep preaching to our campus community, this is not unique to BGSU,” Haschak said. “I’ve been talking a lot lately to my counterparts at universities in Ohio and elsewhere, and we’re all getting hit with these attacks very heavily right now. Some of the phishing scams are pretty good, but unfortunately some are god-awful, and I think people are just not thinking or they’re too busy in their day, they receive something on their phone and they just click it.”

Last month, an especially tricky phishing scam fooled several students who are also employed at the university into giving away their BGSU portal passwords, after which the thieves changed the victims’ direct deposit information so that their money went to accounts controlled by the phishers.

In other scams, the phishers would change the routing number for a bank account tied to a portal user, and then cancel that student’s classes near the beginning of a semester — thus kicking off a fraudulent refund.

One of the victims even had a fraudulent tax refund request filed in her name with the IRS as a result, Haschak said.

“They went in and looked at her W-2 information, which is also available via the portal,” he said.

While BGSU sends an email each time account information is changed, the thieves also have been phishing faculty and staff email accounts — which allows the crooks to delete the notification emails.

“The bad guys also went in and deleted the emails we sent, and then deleted the messages from the victim’s trash folder,” Haschak said.

Part of BGSU's messaging to students and faculty about the new 2FA requirements for university portal access.

Part of BGSU’s messaging to students and faculty about the new 2FA requirements for university portal access.

Ultimately, BGSU opted to roll out 2FA in a second stage for university email, mainly because of the logistics and support issues involved, but also because they wanted to focus on protecting the personally identifiable information in the BGSU portal as quickly as possible.

For now, BGSU is working on automating the opt-in for 2FA on university email. The 2FA system in front of its portal provides several 2FA options for students, including the Duo app, security tokens, or one-time codes sent via phone or SMS.

“If the numbers of compromised accounts keep increasing at the rate they are, we may get to that next level a lot sooner than our current roadmap for email access,” Haschak said.

2FA, also called multi-factor authentication or two-step verification, is a great way to dramatically improve the security of on online account — whether it’s at your bank, a file-sharing service, or your email. The idea is that even if thieves manage to snag your username and password — through phishing or via password-stealing malware — they still need access to that second factor to successfully impersonate you to the system.

Are you taking full advantage of 2FA options available to your various online accounts? Check out to find out where you might be able to harden your online account security.

Sociological ImagesRace, Gender, and Book Reviews

Flashback Friday.

In a post at Fairness and Accuracy in Reporting, Steve Rendall and Zachary Tomanelli investigated the racial breakdown of the book reviewers and authors in two important book review venues, the New York Times Book Review and C-SPAN’s After Words.  They found that the vast majority of both reviewers and authors were white males.

Overall, 95% of the authors and 96% of the reviewers were non-Latino white (compare that with the fact that whites are just over 60% of the U.S. population as of 2016).

Women accounted for between 13 and 31% of the authors and reviewers:

This is some hard data showing that white men’s ideas are made more accessible than the ideas of others, likely translating into greater influence on social discourse and public policy.  These individuals certainly don’t all say the same thing, nor do they necessarily articulate ideas that benefit white men, but a greater diversity of perspectives would certainly enrich our discourse.

Via Scatterplot.

Originally posted in September, 2010.

Lisa Wade, PhD is a professor at Occidental College. She is the author of American Hookup, a book about college sexual culture, and a textbook about gender. You can follow her on Twitter, Facebook, and Instagram.

(View original at

Planet Linux AustraliaJames Morris: Linux Security Summit 2017: CFP Announcement

LSS logo

The 2017 Linux Security Summit CFP (Call for Participation) is now open!

See the announcement here.

The summit this year will be held in Los Angeles, USA on 14-15 September. It will be co-located with the Open Source Summit (formerly LinuxCon), and the Linux Plumbers Conference. We’ll follow essentially the same format as the 2016 event (you can find the recap here).

The CFP closes on June 5th, 2017.

Planet DebianJo Shields: Mono repository changes, beginning Mono vNext

Up to now, Linux packages on have come in two flavours – RPM built for CentOS 7 (and RHEL 7), and .deb built for Debian 7. Universal packages that work on the named distributions, and anything newer.

Except that’s not entirely true.

Firstly, there have been “compatibility repositories” users need to add, to deal with ABI changes in libtiff, libjpeg, and Apache, since Debian 7. Then there’s the packages for ARM64 and PPC64el – neither of those architectures is available in Debian 7, so they’re published in the 7 repo but actually built on 8.

A large reason for this is difficulty in our package publishing pipeline – apt only allows one version-architecture mix in the repository at once, so I can’t have, say, built on AMD64 on both Debian 7 and Ubuntu 16.04.

We’ve been working hard on a new package build/publish pipeline, which can properly support multiple distributions, based on Jenkins Pipeline. This new packaging system also resolves longstanding issues such as “can’t really build anything except Mono” and “Architecture: All packages still get built on Jo’s laptop, with no public build logs”

So, here’s the old build matrix:

Distribution Architectures
Debian 7 ARM hard float, ARM soft float, ARM64 (actually Debian 8), AMD64, i386, PPC64el (actually Debian 8)
CentOS 7 AMD64

And here’s the new one:

Distribution Architectures
Debian 7 ARM hard float (v7), ARM soft float, AMD64, i386
Debian 8 ARM hard float (v7), ARM soft float, ARM64, AMD64, i386, PPC64el
Raspbian 8 ARM hard float (v6)
Ubuntu 14.04 ARM hard float (v7), ARM64, AMD64, i386, PPC64el
Ubuntu 16.04 ARM hard float (v7), ARM64, AMD64, i386, PPC64el
CentOS 6 AMD64, i386
CentOS 7 AMD64

The compatibility repositories will no longer be needed on recent Ubuntu or Debian – just use the right repository for your system. If your distribution isn’t listed… sorry, but we need to draw a line somewhere on support, and the distributions listed here are based on heavy analysis of our web server logs and bug requests.

You’ll want to change your package manager repositories to reflect your system more accurately, once Mono vNext is published. We’re debating some kind of automated handling of this, but I’m loathe to touch users’ sources.list without their knowledge.

CentOS builds are going to be late – I’ve been doing all my prototyping against the Debian builds, as I have better command of the tooling. Hopefully no worse than a week or two.

Worse Than FailureError'd: {{$Errord_title = null}}

"Wow! Those folks from null and undefined must be big fans! I mean, just look at that voting turnout!" Kayleigh wrote.


"Ah, Google News, you never fail to find the bugs in news sites' pages," wrote Paul B.


Geof writes, "Based on the reservation name, I hope that I won't be eating alone."


"Viber blocks 'null' friends because no one needs friends that are never there when you need them," Chizzy wrote.


"Thanks AT&T for the personalized video tour of, apparently, nobody's bill," writes Zach K.


Kevin L. writes, "So...will I have to hold the deposit, and null deposit to enter the lease?"


"Looks like little Bobby Tables signed up for an account with Vodafone NZ just before my mother tried to," wrote Peter G.


[Advertisement] Onsite, remote, bare-metal or cloud – create, configure and orchestrate 1,000s of servers, all from the same dashboard while continually monitoring for drift and allowing for instantaneous remediation. Download Otter today!

Planet DebianSylvain Beucler: Practical basics of reproducible builds

As GNU FreeDink upstream, I'd very much like to offer pre-built binaries: one (1) official, tested, current, distro-agnostic version of the game with its dependencies.
I'm actually already doing that for the Windows version.
One issue though: people have to trust me -- and my computer's integrity.
Reproducible builds could address that.
My release process is tightly controlled, but is my project reproducible? If not, what do I need? Let's check!

I quickly see that documentation is getting better, namely :)
(The first docs I read on reproducibility looked more like a crazed date-o-phobic rant than actual solution - plus now we have SOURCE_DATE_EPOCH implemented in gcc ;))

However I was left unsatisfied by the very high-level viewpoint and the lack of concrete examples.
The document points to various issues but is very vague about what tools are impacted.

So let's do some tests!

Let's start with a trivial program:

$ cat > hello.c
#include <stdio.h>
int main(void) {
    printf("Hello, world!\n");

OK, first does GCC compile this reproducibly?
I'm not sure because I heard of randomness in identifiers and such in the compilation process...

$ gcc-5 hello.c -o hello-5
$ md5sum hello-5
a00416d7392442321bad4afc5a461321  hello-5
$ gcc-5 hello.c -o hello-5
$ md5sum hello-5
a00416d7392442321bad4afc5a461321  hello-5

Cool, ELF compiler output is stable through time!
Now do 2 versions of GCC compile a hello world identically?

$ gcc-6 hello.c -o hello-6
$ md5sum hello-6
f7f52c2f5f82fe2a95061a771a6c5acd  hello-6
$ hexcompare hello-5 hello-6
[lots of red]

Well let's not get our hopes too high ;)
Trivial build options change?

$ gcc-6 hello.c -lc -o hello-6
$ gcc-6 -lc hello.c -o hello-6b
$ md5sum hello-6 hello-6b
f7f52c2f5f82fe2a95061a771a6c5acd  hello-6
f73ee6d8c3789fd8f899f5762025420e  hello-6b
$ hexcompare hello-6 hello-6b
[lots of red]

OK, let's be very careful with build options then. What about 2 different build paths?

$ cd ..
$ cp -a repro/ repro2/
$ cd repro2/
$ gcc-6 hello.c -o hello-6
$ md5sum hello-6
f7f52c2f5f82fe2a95061a771a6c5acd  hello-6

Basic compilation is stable across directories.
Now I tried recompiling identically FreeDink on 2 different git clones.

$ md5sum freedink/native/src/freedink freedink2/native/src/freedink
839ccd9180c72343e23e5d9e2e65e237  freedink/native/src/freedink
6d5dc6aab321fab01b424ac44c568dcf  freedink2/native/src/freedink
$ hexcompare freedink2/native/src/freedink freedink/native/src/freedink
[lots of red]

Hmm, what about stripped versions?

$ strip freedink/native/src/freedink freedink2/native/src/freedink
$ md5sum freedink/native/src/freedink freedink2/native/src/freedink
415e96bb54456f3f2a759f404f18c711  freedink/native/src/freedink
e0702d798807c83d21f728106c9261ad  freedink2/native/src/freedink
$ hexcompare freedink/native/src/freedink freedink2/native/src/freedink
[1 single red spot]

OK, what's happening? diffoscope to the rescue:

$ diffoscope freedink/native/src/freedink freedink2/native/src/freedink
--- freedink/native/src/freedink
+++ freedink2/native/src/freedink
├── readelf --wide --notes {}
│ @@ -3,8 +3,8 @@
│    Owner                 Data size  Description
│    GNU                  0x00000010  NT_GNU_ABI_TAG (ABI version tag)
│      OS: Linux, ABI: 2.6.32
│  Displaying notes found in:
│    Owner                 Data size  Description
│    GNU                  0x00000014  NT_GNU_BUILD_ID (unique build ID bitstring)-    Build ID: a689574d69072bb64b28ffb82547e126284713fa
│ +    Build ID: d7be191a61e84648a58c18e9c108b3f3ce500302

What on earth is Build ID and how it is computed?
After much digging, I find it's a 2008 plan with application in selecting matching detached debugging symbols. is the most detailed overview/rationale I found.
It is supposed to be computed from parts of the binary. It's actually pretty resistant to changes, e.g. I could add the missing "return 0;" in my hello source and get the exact same Build ID!
On the other hand my FreeDink binaries do match except for the Build ID so there must be a catch.

Let's try our basic example with default ./configure CFLAGS:

$ (cd repro/ && gcc -g -O2 hello.c -o hello)
$ (cd repro/ && gcc -g -O2 hello.c -o hello-b)
$ md5sum repro/hello repro/hello-b
6b2cd79947d7c5ed2e505ddfce167116  repro/hello
6b2cd79947d7c5ed2e505ddfce167116  repro/hello-b
# => OK for now

$ (cd repro2/ && gcc -g -O2 hello.c -o hello)
$ md5sum repro2/hello
20b4d09d94de5840400be05bc76e4172  repro2/hello
$ strip repro/hello repro2/hello
$ diffoscope repro/hello repro2/hello
--- repro/hello
+++ repro2/hello2
├── readelf --wide --notes {}
│ @@ -3,8 +3,8 @@
│    Owner                 Data size  Description
│    GNU                  0x00000010  NT_GNU_ABI_TAG (ABI version tag)
│      OS: Linux, ABI: 2.6.32
│  Displaying notes found in:
│    Owner                 Data size  Description
│    GNU                  0x00000014  NT_GNU_BUILD_ID (unique build ID bitstring)-    Build ID: 462a3c613537bb57f20bd3ccbe6b7f6d2bdc72ba
│ +    Build ID: b4b448cf93e7b541ad995075d2b688ef296bd88b
# => issue reproduced with -g -O2 and different build directories

$ (cd repro/ && gcc -O2 hello.c -o hello)
$ (cd repro2/ && gcc -O2 hello.c -o hello)
$ md5sum repro/hello repro2/hello
1571d45eb5807f7a074210be17caa87b  repro/hello
1571d45eb5807f7a074210be17caa87b  repro2/hello
# => culprit is not -O2, so culprit is -g

Bummer. So the build ID must be computed also from the debug symbols, even if I strip them afterwards :(
OK, so when says "Some tools will record the path of the source files in their output", that means the compiler, and more importantly the stripped executable.

Conclusion: apparently to achieve reproducible builds I need identical full build paths and to keep track of them.

What about Windows/MinGW btw?

$ /opt/mxe/usr/bin/i686-w64-mingw32.static-gcc hello.c -o hello.exe
$ md5sum hello.exe 
e0fa685f6866029b8e03f9f2837dc263  hello.exe
$ /opt/mxe/usr/bin/i686-w64-mingw32.static-gcc hello.c -o hello.exe
$ md5sum hello.exe 
df7566c0ac93ea4a0b53f4af83d7fbc9  hello.exe
$ /opt/mxe/usr/bin/i686-w64-mingw32.static-gcc hello.c -o hello.exe
$ md5sum hello.exe 
bbf4ab22cbe2df1ddc21d6203e506eb5  hello.exe

PE compiler output is not stable through time.
(any clue?)

OK, there's still a long road ahead of us...

There are lots of other questions.
Is autoconf output reproducible?
Does it actually matter if autoconf is reproducible if upstream is providing a pre-generated ./configure?
If not what about all the documentation on making tarballs reproducible, along with the strip-nondeterminism tool?
Where do we draw the line between build and build environment?
What are the legal issues of distributing a docker-based build environment without every single matching distro source packages?

That was my modest contribution to practical reproducible builds documentation for developers, I'd very much like to hear about more of it.
Who knows, maybe in the near future we'll get reproducible official builds for Eclipse, ZAP, JetBrains, Krita, Android SDK/NDK... :)

Planet DebianDirk Eddelbuettel: RApiDatetime 0.0.1

Very happy to announce a new package of mine is now up on the CRAN repository network: RApiDatetime.

It provides six entry points for C-level functions of the R API for Date and Datetime calculations: asPOSIXlt and asPOSIXct convert between long and compact datetime representation, formatPOSIXlt and Rstrptime convert to and from character strings, and POSIXlt2D and D2POSIXlt convert between Date and POSIXlt datetime. These six functions are all fairly essential and useful, but not one of them was previously exported by R. Hence the need to put them together in the this package to complete the accessible API somewhat.

These should be helpful for fellow package authors as many of us have either our own partial copies of some of this code, or rather farm back out into R to get this done.

As a simple (yet real!) illustration, here is an actual Rcpp function which we could now cover at the C level rather than having to go back up to R (via Rcpp::Function()):

    inline Datetime::Datetime(const std::string &s, const std::string &fmt) {
        Rcpp::Function strptime("strptime");    // we cheat and call strptime() from R
        Rcpp::Function asPOSIXct("as.POSIXct"); // and we need to convert to POSIXct
        m_dt = Rcpp::as<double>(asPOSIXct(strptime(s, fmt)));

I had taken a first brief stab at this about two years ago, but never finished. With the recent emphasis on C-level function registration, coupled with a possible use case from anytime I more or less put this together last weekend.

It currently builds and tests fine on POSIX-alike operating systems. If someone with some skill and patience in working on Windows would like to help complete the Windows side of things then I would certainly welcome help and pull requests.

For questions or comments please use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.


Planet Linux Australiasthbrx - a POWER technical blog: Erasure Coding for Programmers, Part 2

We left part 1 having explored GF(2^8) and RAID 6, and asking the question "what does all this have to do with Erasure Codes?"

Basically, the thinking goes "RAID 6 is cool, but what if, instead of two parity disks, we had an arbitrary number of parity disks?"

How would we do that? Well, let's introduce our new best friend: Coding Theory!

Say we want to transmit some data across an error-prone medium. We don't know where the errors might occur, so we add some extra information to allow us to detect and possibly correct for errors. This is a code. Codes are a largish field of engineering, but rather than show off my knowledge about systematic linear block codes, let's press on.

Today, our error-prone medium is an array of inexpensive disks. Now we make this really nice assumption about disks, namely that they are either perfectly reliable or completely missing. In other words, we consider that a disk will either be present or 'erased'. We come up with 'erasure codes' that are able to reconstruct data when it is known to be missing. (This is a slightly different problem to being able to verify and correct data that might or might not be subtly corrupted. Disks also have to deal with this problem, but it is not something erasure codes address!)

The particular code we use is a Reed-Solomon code. The specific details are unimportant, but there's a really good graphical outline of the broad concepts in sections 1 and 3 of the Jerasure paper/manual. (Don't go on to section 4.)

That should give you some background on how this works at a pretty basic mathematical level. Implementation is a matter of mapping that maths (matrix multiplication) onto hardware primitives, and making it go fast.


I'm deliberately not covering some pretty vast areas of what would be required to write your own erasure coding library from scratch. I'm not going to talk about how to compose the matricies, how to invert them, or anything like that. I'm not sure how that would be a helpful exercise - ISA-L and jerasure already exist and do that for you.

What I want to cover is an efficient implementation of the some algorithms, once you have the matricies nailed down.

I'm also going to assume your library already provides a generic multiplication function in GF(2^8). That's required to construct the matrices, so it's a pretty safe assumption.

The beginnings of an API

Let's make this a bit more concrete.

This will be heavily based on the ISA-L API but you probably want to plug into ISA-L anyway, so that shouldn't be a problem.

What I want to do is build up from very basic algorithmic components into something useful.

The first thing we want to do is to be able to is Galois Field multiplication of an entire region of bytes by an arbitrary constant.

We basically want gf_vect_mul(size_t len, <something representing the constant>, unsigned char * src, unsigned char * dest)

Simple and slow approach

The simplest way is to do something like this:

void gf_vect_mul_simple(size_t len, unsigned char c, unsigned char * src, unsigned char * dest) {

    size_t i;
    for (i=0; i<len; i++) {
        dest[i] = gf_mul(c, src[i]);

That does multiplication element by element using the library's supplied gf_mul function, which - as the name suggests - does GF(2^8) multiplication of a scalar by a scalar.

This works. The problem is that it is very, painfully, slow - in the order of a few hundred megabytes per second.

Going faster

How can we make this faster?

There are a few things we can try: if you want to explore a whole range of different ways to do this, check out the gf-complete project. I'm going to assume we want to skip right to the end and know what is the fastest we've found.

Cast your mind back to the RAID 6 paper (PDF). I talked about in part 1. That had a way of doing an efficient multiplication in GF(2^8) using vector instructions.

To refresh your memory, we split the multiplication into two parts - low bits and high bits, looked them up separately in a lookup table, and joined them with XOR. We then discovered that on modern Power chips, we could do that in one instruction with vpermxor.

So, a very simple way to do this would be:

  • generate the table for a
  • for each 16-byte chunk of our input:
    • load the input
    • do the vpermxor with the table
    • save it out

Generating the tables is reasonably straight-forward, in theory. Recall that the tables are a * {{00},{01},...,{0f}} and a * {{00},{10},..,{f0}} - a couple of loops in C will generate them without difficulty. ISA-L has a function to do this, as does gf-complete in split-table mode, so I won't repeat them here.

So, let's recast our function to take the tables as an input rather than the constant a. Assume we're provided the two tables concatenated into one 32-byte chunk. That would give us:

void gf_vect_mul_v2(size_t len, unsigned char * table, unsigned char * src, unsigned char * dest)

Here's how you would do it in C:

void gf_vect_mul_v2(size_t len, unsigned char * table, unsigned char * src, unsigned char * dest) {
        vector unsigned char tbl1, tbl2, in, out;
        size_t i;

        /* Assume table, src, dest are aligned and len is a multiple of 16 */

        tbl1 = vec_ld(16, table);
        tbl2 = vec_ld(0, table);
        for (i=0; i<len; i+=16) {
            in = vec_ld(i, (unsigned char *)src);
            __asm__("vpermxor %0, %1, %2, %3" : "=v"(out) : "v"(tbl1), "v"(tbl2), "v"(in)
            vec_st(out, i, (unsigned char *)dest);

There's a few quirks to iron out - making sure the table is laid out in the vector register in the way you expect, etc, but that generally works and is quite fast - my Power 8 VM does about 17-18 GB/s with non-cache-contained data with this implementation.

We can go a bit faster by doing larger chunks at a time:

    for (i=0; i<vlen; i+=64) {
            in1 = vec_ld(i, (unsigned char *)src);
            in2 = vec_ld(i+16, (unsigned char *)src);
            in3 = vec_ld(i+32, (unsigned char *)src);
            in4 = vec_ld(i+48, (unsigned char *)src);
            __asm__("vpermxor %0, %1, %2, %3" : "=v"(out1) : "v"(tbl1), "v"(tbl2), "v"(in1));
            __asm__("vpermxor %0, %1, %2, %3" : "=v"(out2) : "v"(tbl1), "v"(tbl2), "v"(in2));
            __asm__("vpermxor %0, %1, %2, %3" : "=v"(out3) : "v"(tbl1), "v"(tbl2), "v"(in3));
            __asm__("vpermxor %0, %1, %2, %3" : "=v"(out4) : "v"(tbl1), "v"(tbl2), "v"(in4));
            vec_st(out1, i, (unsigned char *)dest);
            vec_st(out2, i+16, (unsigned char *)dest);
            vec_st(out3, i+32, (unsigned char *)dest);
            vec_st(out4, i+48, (unsigned char *)dest);

This goes at about 23.5 GB/s.

We can go one step further and do the core loop in assembler - that means we control the instruction layout and so on. I tried this: it turns out that for the basic vector multiply loop, if we turn off ASLR and pin to a particular CPU, we can see a improvement of a few percent (and a decrease in variability) over C code.

Building from vector multiplication

Once you're comfortable with the core vector multiplication, you can start to build more interesting routines.

A particularly useful one on Power turned out to be the multiply and add routine: like gf_vect_mul, except that rather than overwriting the output, it loads the output and xors the product in. This is a simple extension of the gf_vect_mul function so is left as an exercise to the reader.

The next step would be to start building erasure coding proper. Recall that to get an element of our output, we take a dot product: we take the corresponding input element of each disk, multiply it with the corresponding GF(2^8) coding matrix element and sum all those products. So all we need now is a dot product algorithm.

One approach is the conventional dot product:

  • for each element
    • zero accumulator
    • for each source
      • load input[source][element]
      • do GF(2^8) multiplication
      • xor into accumulator
    • save accumulator to output[element]

The other approach is multiply and add:

  • for each source
    • for each element
      • load input[source][element]
      • do GF(2^8) multiplication
      • load output[element]
      • xor in product
      • save output[element]

The dot product approach has the advantage of fewer writes. The multiply and add approach has the advantage of better cache/prefetch performance. The approach you ultimately go with will probably depend on the characteristics of your machine and the length of data you are dealing with.

For what it's worth, ISA-L ships with only the first approach in x86 assembler, and Jerasure leans heavily towards the second approach.

Once you have a vector dot product sorted, you can build a full erasure coding setup: build your tables with your library, then do a dot product to generate each of your outputs!

In ISA-L, this is implemented something like this:

 * ec_encode_data_simple(length of each data input, number of inputs,
 *                       number of outputs, pre-generated GF(2^8) tables,
 *                       input data pointers, output code pointers)
void ec_encode_data_simple(int len, int k, int rows, unsigned char *g_tbls,
                           unsigned char **data, unsigned char **coding)
        while (rows) {
                gf_vect_dot_prod(len, k, g_tbls, data, *coding);
                g_tbls += k * 32;

Going faster still

Eagle eyed readers will notice that however we generate an output, we have to read all the input elements. This means that if we're doing a code with 10 data disks and 4 coding disks, we have to read each of the 10 inputs 4 times.

We could do better if we could calculate multiple outputs for each pass through the inputs. This is a little fiddly to implement, but does lead to a speed improvement.

ISA-L is an excellent example here. Intel goes up to 6 outputs at once: the number of outputs you can do is only limited by how many vector registers you have to put the various operands and results in.

Tips and tricks

  • Benchmarking is tricky. I do the following on a bare-metal, idle machine, with ASLR off and pinned to an arbitrary hardware thread. (Code is for the fish shell)

    for x in (seq 1 50)
        setarch ppc64le -R taskset -c 24 erasure_code/gf_vect_mul_perf
    end | awk '/MB/ {sum+=$13} END {print sum/50, "MB/s"}'
  • Debugging is tricky; the more you can do in C and the less you do in assembly, the easier your life will be.

  • Vector code is notoriously alignment-sensitive - if you can't figure out why something is wrong, check alignment. (Pro-tip: ISA-L does not guarantee the alignment of the gftbls parameter, and many of the tests supply an unaligned table from the stack. For testing __attribute__((aligned(16))) is your friend!)

  • Related: GCC is moving towards assignment over vector intrinsics, at least on Power:

    vector unsigned char a;
    unsigned char * data;
    // good, also handles word-aligned data with VSX
    a = *(vector unsigned char *)data;
    // bad, requires special handling of non-16-byte aligned data
    a = vec_ld(0, (unsigned char *) data);


Hopefully by this point you're equipped to figure out how your erasure coding library of choice works, and write your own optimised implementation (or maintain an implementation written by someone else).

I've referred to a number of resources throughout this series:

If you want to go deeper, I also read the following and found them quite helpful in understanding Galois Fields and Reed-Solomon coding:

For a more rigorous mathematical approach to rings and fields, a university mathematics course may be of interest. For more on coding theory, a university course in electronics engineering may be helpful.

Harald WelteUpcoming v3 of Open Hardware miniPCIe WWAN modem USB breakout board

Back in October 2016 I designed a small open hardware breakout board for WWAN modems in mPCIe form-factor. I was thinking some other people might be interested in this, and indeed, the first manufacturing batch is already sold out by now.

Instead of ordering more of the old (v2) design, I decided to do some improvements in the next version:

  • add mounting holes so the PCB can be mounted via M3 screws
  • add U.FL and SMA sockets, so the modems are connected via a short U.FL to U.FL cable, and external antennas or other RF components can be attached via SMA. This provides strain relief for the external antenna or cabling and avoids tearing off any of the current loose U.FL to SMA pigtails
  • flip the SIM slot to the top side of the PCB, so it can be accessed even after mounting the board to some base plate or enclosure via the mounting holes
  • more meaningful labeling of the silk screen, including the purpose of the jumpers and the input voltage.

A software rendering of the resulting v3 PCB design files that I just sent for production looks like this:


Like before, the design of the board (including schematics and PCB layout design files) is available as open hardware under CC-BY-SA license terms. For more information see

It will take some expected three weeks until I'll see the first assembled boards.

I'm also planning to do a M.2 / NGFF version of it, but haven't found the time to get around doing it so far.

Planet DebianSimon McVittie: GTK hackfest 2017: D-Bus communication with containers

At the GTK hackfest in London (which accidentally became mostly a Flatpak hackfest) I've mainly been looking into how to make D-Bus work better for app container technologies like Flatpak and Snap.

The initial motivating use cases are:

  • Portals: Portal authors need to be able to identify whether the container is being contacted by an uncontained process (running with the user's full privileges), or whether it is being contacted by a contained process (in a container created by Flatpak or Snap).

  • dconf: Currently, a contained app either has full read/write access to dconf, or no access. It should have read/write access to its own subtree of dconf configuration space, and no access to the rest.

At the moment, Flatpak runs a D-Bus proxy for each app instance that has access to D-Bus, connects to the appropriate bus on the app's behalf, and passes messages through. That proxy is in a container similar to the actual app instance, but not actually the same container; it is trusted to not pass messages through that it shouldn't pass through. The app-identification mechanism works in practice, but is Flatpak-specific, and has a known race condition due to process ID reuse and limitations in the metadata that the Linux kernel maintains for AF_UNIX sockets. In practice the use of X11 rather than Wayland in current systems is a much larger loophole in the container than this race condition, but we want to do better in future.

Meanwhile, Snap does its sandboxing with AppArmor, on kernels where it is enabled both at compile-time (Ubuntu, openSUSE, Debian, Debian derivatives like Tails) and at runtime (Ubuntu, openSUSE and Tails, but not Debian by default). Ubuntu's kernel has extra AppArmor features that haven't yet gone upstream, some of which provide reliable app identification via LSM labels, which dbus-daemon can learn by querying its AF_UNIX socket. However, other kernels like the ones in openSUSE and Debian don't have those. The access-control (AppArmor mediation) is implemented in upstream dbus-daemon, but again doesn't work portably, and is not sufficiently fine-grained or flexible to do some of the things we'll likely want to do, particularly in dconf.

After a lot of discussion with dconf maintainer Allison Lortie and Flatpak maintainer Alexander Larsson, I think I have a plan for fixing this.

This is all subject to change: see fd.o #100344 for the latest ideas.

Identity model

Each user (uid) has some uncontained processes, plus 0 or more containers.

The uncontained processes include dbus-daemon itself, desktop environment components such as gnome-session and gnome-shell, the container managers like Flatpak and Snap, and so on. They have the user's full privileges, and in particular they are allowed to do privileged things on the user's session bus (like running dbus-monitor), and act with the user's full privileges on the system bus. In generic information security jargon, they are the trusted computing base; in AppArmor jargon, they are unconfined.

The containers are Flatpak apps, or Snap apps, or other app-container technologies like Firejail and AppImage (if they adopt this mechanism, which I hope they will), or even a mixture (different app-container technologies can coexist on a single system). They are containers (or container instances) and not "apps", because in principle, you could install com.example.MyApp 1.0, run it, and while it's still running, upgrade to com.example.MyApp 2.0 and run that; you'd have two containers for the same app, perhaps with different permissions.

Each container has an container type, which is a reversed DNS name like org.flatpak or io.snapcraft representing the container technology, and an app identifier, an arbitrary non-empty string whose meaning is defined by the container technology. For Flatpak, that string would be another reversed DNS name like com.example.MyGreatApp; for Snap, as far as I can tell it would look like example-my-great-app.

The container technology can also put arbitrary metadata on the D-Bus representation of a container, again defined and namespaced by the container technology. For instance, Flatpak would use some serialization of the same fields that go in the Flatpak metadata file at the moment.

Finally, the container has an opaque container identifier identifying a particular container instance. For example, launching com.example.MyApp twice (maybe different versions or with different command-line options to flatpak run) might result in two containers with different privileges, so they need to have different container identifiers.

Contained server sockets

App-container managers like Flatpak and Snap would create an AF_UNIX socket inside the container, bind() it to an address that will be made available to the contained processes, and listen(), but not accept() any new connections. Instead, they would fd-pass the new socket to the dbus-daemon by calling a new method, and the dbus-daemon would proceed to accept() connections after the app-container manager has signalled that it has called both bind() and listen(). (See fd.o #100344 for full details.)

Processes inside the container must not be allowed to contact the AF_UNIX socket used by the wider, uncontained system - if they could, the dbus-daemon wouldn't be able to distinguish between them and uncontained processes and we'd be back where we started. Instead, they should have the new socket bind-mounted into their container's XDG_RUNTIME_DIR and connect to that, or have the new socket set as their DBUS_SESSION_BUS_ADDRESS and be prevented from connecting to the uncontained socket in some other way. Those familiar with the kdbus proposals a while ago might recognise this as being quite similar to kdbus' concept of endpoints, and I'm considering reusing that name.

Along with the socket, the container manager would pass in the container's identity and metadata, and the method would return a unique, opaque identifier for this particular container instance. The basic fields (container technology, technology-specific app ID, container ID) should probably be added to the result of GetConnectionCredentials(), and there should be a new API call to get all of those plus the arbitrary technology-specific metadata.

When a process from a container connects to the contained server socket, every message that it sends should also have the container instance ID in a new header field. This is OK even though dbus-daemon does not (in general) forbid sender-specified future header fields, because any dbus-daemon that supported this new feature would guarantee to set that header field correctly, the existing Flatpak D-Bus proxy already filters out unknown header fields, and adding this header field is only ever a reduction in privilege.

The reasoning for using the sender's container instance ID (as opposed to the sender's unique name) is for services like dconf to be able to treat multiple unique bus names as belonging to the same equivalence class of contained processes: instead of having to look up the container metadata once per unique name, dconf can look it up once per container instance the first time it sees a new identifier in a header field. For the second and subsequent unique names in the container, dconf can know that the container metadata and permissions are identical to the one it already saw.

Access control

In principle, we could have the new identification feature without adding any new access control, by keeping Flatpak's proxies. However, in the short term that would mean we'd be adding new API to set up a socket for a container without any access control, and having to keep the proxies anyway, which doesn't seem great; in the longer term, I think we'd find ourselves adding a second new API to set up a socket for a container with new access control. So we might as well bite the bullet and go for the version with access control immediately.

In principle, we could also avoid the need for new access control by ensuring that each service that will serve contained clients does its own. However, that makes it really hard to send broadcasts and not have them unintentionally leak information to contained clients - we would need to do something more like kdbus' approach to multicast, where services know who has subscribed to their multicast signals, and that is just not how dbus-daemon works at the moment. If we're going to have access control for broadcasts, it might as well also cover unicast.

The plan is that messages from containers to the outside world will be mediated by a new access control mechanism, in parallel with dbus-daemon's current support for firewall-style rules in the XML bus configuration, AppArmor mediation, and SELinux mediation. A message would only be allowed through if the XML configuration, the new container access control mechanism, and the LSM (if any) all agree it should be allowed.

By default, processes in a container can send broadcast signals, and send method calls and unicast signals to other processes in the same container. They can also receive method calls from outside the container (so that interfaces like org.freedesktop.Application can work), and send exactly one reply to each of those method calls. They cannot own bus names, communicate with other containers, or send file descriptors (which reduces the scope for denial of service).

Obviously, that's not going to be enough for a lot of contained apps, so we need a way to add more access. I'm intending this to be purely additive (start by denying everything except what is always allowed, then add new rules), not a mixture of adding and removing access like the current XML policy language.

There are two ways we've identified for rules to be added:

  • The container manager can pass a list of rules into the dbus-daemon at the time it attaches the contained server socket, and they'll be allowed. The obvious example is that an org.freedesktop.Application needs to be allowed to own its own bus name. Flatpak apps' implicit permission to talk to portals, and Flatpak metadata like org.gnome.SessionManager=talk, could also be added this way.

  • System or session services that are specifically designed to be used by untrusted clients, like the version of dconf that Allison is working on, could opt-in to having contained apps allowed to talk to them (effectively making them a generalization of Flatpak portals). The simplest such request, for something like a portal, is "allow connections from any container to contact this service"; but for dconf, we want to go a bit finer-grained, with all containers allowed to contact a single well-known rendezvous object path, and each container allowed to contact an additional object path subtree that is allocated by dconf on-demand for that app.

Initially, many contained apps would work in the first way (and in particular sockets=session-bus would add a rule that allows almost everything), while over time we'll probably want to head towards recommending more use of the second.

Related topics

Access control on the system bus

We talked about the possibility of using a very similar ruleset to control access to the system bus, as an alternative to the XML rules found in /etc/dbus-1/system.d and /usr/share/dbus-1/system.d. We didn't really come to a conclusion here.

Allison had the useful insight that the XML rules are acting like a firewall: they're something that is placed in front of potentially-broken services, and not part of the services themselves (which, as with firewalls like ufw, makes it seem rather odd when the services themselves install rules). D-Bus system services already have total control over what requests they will accept from D-Bus peers, and if they rely on the XML rules to mediate that access, they're essentially rejecting that responsibility and hoping the dbus-daemon will protect them. The D-Bus maintainers would much prefer it if system services took responsibility for their own access control (with or without using polkit), because fundamentally the system service is always going to understand its domain and its intended security model better than the dbus-daemon can.

Analogously, when a network service listens on all addresses and accepts requests from elsewhere on the LAN, we sometimes work around that by protecting it with a firewall, but the optimal resolution is to get that network service fixed to do proper authentication and access control instead.

For system services, we continue to recommend essentially this "firewall" configuration, filling in the ${} variables as appropriate:

    <policy user="${the daemon uid under which the service runs}">
        <allow own="${the service's bus name}"/>
    <policy context="default">
        <allow send_destination="${the service's bus name}"/>

We discussed the possibility of moving towards a model where the daemon uid to be allowed is written in the .service file, together with an opt-in to "modern D-Bus access control" that makes the "firewall" unnecessary; after some flag day when all significant system services follow that pattern, dbus-daemon would even have the option of no longer applying the "firewall" (moving to an allow-by-default model) and just refusing to activate system services that have not opted in to being safe to use without it. However, the "firewall" also protects system bus clients, and services like Avahi that are not bus-activatable, against unintended access, which is harder to solve via that approach; so this is going to take more thought.

For system services' clients that follow the "agent" pattern (BlueZ, polkit, NetworkManager, Geoclue), the correct "firewall" configuration is more complicated. At some point I'll try to write up a best-practice for these.

New header fields for the system bus

At the moment, it's harder than it needs to be to provide non-trivial access control on the system bus, because on receiving a method call, a service has to remember what was in the method call, then call GetConnectionCredentials() to find out who sent it, then only process the actual request when it has the information necessary to do access control.

Allison and I had hoped to resolve this by adding new D-Bus message header fields with the user ID, the LSM label, and other interesting facts for access control. These could be "opt-in" to avoid increasing message sizes for no reason: in particular, it is not typically useful for session services to receive the user ID, because only one user ID is allowed to connect to the session bus anyway.

Unfortunately, the dbus-daemon currently lets unknown fields through without modification. With hindsight this seems an unwise design choice, because header fields are a finite resource (there are 255 possible header fields) and are defined by the D-Bus Specification. The only field that can currently be trusted is the sender's unique name, because the dbus-daemon sets that field, overwriting the value in the original message (if any).

To make it safe to rely on the new fields, we would have to make the dbus-daemon filter out all unknown header fields, and introduce a mechanism for the service to check (during connection to the bus) whether the dbus-daemon is sufficiently new that it does so. If connected to an older dbus-daemon, the service would not be able to rely on the new fields being true, so it would have to ignore the new fields and treat them as unset. The specification is sufficiently vague that making new dbus-daemons filter out unknown header fields is a valid change (it just says that "Header fields with an unknown or unexpected field code must be ignored", without specifying who must ignore them, so having the dbus-daemon delete those fields seems spec-compliant).

This all seemed fine when we discussed it in person; but GDBus already has accessors for arbitrary header fields by numeric ID, and I'm concerned that this might mean it's too easy for a system service to be accidentally insecure: It would be natural (but wrong!) for an implementor to assume that if g_message_get_header (message, G_DBUS_MESSAGE_HEADER_FIELD_SENDER_UID) returned non-NULL, then that was guaranteed to be the correct, valid sender uid. As a result, fd.o #100317 might have to be abandoned. I think more thought is needed on that one.

Unrelated topics

As happens at any good meeting, we took the opportunity of high-bandwidth discussion to cover many useful things and several useless ones. Other discussions that I got into during the hackfest included, in no particular order:

  • .desktop file categories and how to adapt them for AppStream, perhaps involving using the .desktop vocabulary but relaxing some of the hierarchy restrictions so they behave more like "tags"
  • how to build a recommended/reference "app store" around Flatpak, aiming to host upstream-supported builds of major projects like LibreOffice
  • how Endless do their content-presenting and content-consuming apps in GTK, with a lot of "tile"-based UIs with automatic resizing and reflowing (similar to responsive design), and the applicability of similar widgets to GNOME and upstream GTK
  • whether and how to switch GNOME developer documentation to Hotdoc
  • whether pies, fish and chips or scotch eggs were the most British lunch available from Borough Market
  • the distinction between stout, mild and porter

More notes are available from the GNOME wiki.


The GTK hackfest was organised by GNOME and hosted by Red Hat and Endless. My attendance was sponsored by Collabora. Thanks to all the sponsors and organisers, and the developers and organisations who attended.

CryptogramHackers Threaten to Erase Apple Customer Data

Turkish hackers are threatening to erase millions of iCloud user accounts unless Apple pays a ransom.

This is a weird story, and I'm skeptical of some of the details. Presumably Apple has decided that it's smarter to spend the money on secure backups and other security measures than to pay the ransom. But we'll see how this unfolds.

Google AdsenseGlobal Spotlight: Capitalizing on Vietnam’s digital opportunity

This week the AdSense Global Spotlight shifts its focus to Vietnam, home to nearly 90 million people and one of southeast Asia’s fastest-growing economies. The country offers an unmissable opportunity for AdSense publishers interested in global audience growth.

First let’s look at the bigger picture: there’s no denying that the world becomes more digital each and every day. While global advertising expenditure is expected to reach $573 billion in 2017,(1) the key growth drivers are strong demand for digital advertising, specifically mobile campaigns.
Drilling down on Vietnam, the smartphone continues to pick up pace in this country, where many are mobile-first or mobile-only users. While Asia-Pacific made up 34% of all smartphone users in 2008, it’s expected to leap to 55% in 2017. Some Vietnam-based publishers have already seen mobile account for as much as 90% of their traffic.(2) 

With all that in mind, there are seven key opportunities not to be missed in Vietnam this year:

  1. Fast mobile experiences: The most successful publishers in the region are devoting development resources to boost load speeds of mobile sites and ads and to improve the user’s experience. Recent research from DoubleClick indicates that over 50% of mobile site visits are abandoned if pages take longer than 3 seconds to load. Invest in increasing your mobile page speed to influence increases in time on site, engagement, and re-engagement.
  2. Native advertising: These ads fit into the look and feel of your website, making it a better and more effective ad experience for your visitors. Native formats include custom sponsored content, content recommendations, and in-feed ad units.
  3. Programmatic ad transactions: Programmatic advertising uses software and algorithms to match publishers’ inventory with buyers in search of ad space. An auction system ensures the ad of the buyer with the highest bid fills each space, which can add up to revenue gains for you and your site. Consider beginning with this short quiz to help you decide if the correct next step for your business is programmatic advertising.
  4. Better ad experiences: Because marketers want to buy ads with a high chance of actually being seen by users, there’s a shift towards valuing viewable impressions over served impressions. Page level ads is an AdSense family of ads that you can use that is optimized to show when ads are likely to perform well and be seen. 
  5. Digital ad serving: As you grow as a publisher, you might find you need more control to sell, schedule, deliver and measure advertising deals across multiple digital properties. DoubleClick for Publishers (DFP) Small Business is an industry-leading platform to consider given its easy to use interface and built-in yield management technology
  6. Video: According to Buffer, visual content is more than 40 times more likely to get shared on social media than other types of content. Furthermore, YouTube recently announced that over one billion hours of video content is watched on YouTube every day. If you don’t already create, embed, and monetize original video content on your site, now’s the time to incorporate video into your content strategy. 
  7. Messaging: Throughout Asia, the commercialization of social media has helped messaging platforms to evolve into central areas of ecommerce. For AdSense publishers, messaging platforms could provide an opportunity to drive mobile users to your content as a channel for content discovery. 
Remember, there are 68 million native Vietnamese speakers in the world today who are going online in growing numbers. If you’re a publisher in Vietnam, signing up for AdSense is an easy way to turn your in-demand content into profit. And for publishers already using AdSense, Vietnam presents an exceptional opportunity to grow your site visitors.  

To explore the possibilities for your business and site, don’t miss our live stream on 24th March where we’ll share even more recommendations for the Vietnamese market. You can sign up for AdSense here, and start turning your #PassionIntoProfit today.

Posted by: Jay Castro from the AdSense team

(1) eMarketer Worldwide Ad Spend Report 2016 
(2) eMarketer Vietnam Online 2016 Report

Planet DebianNeil McGovern: GNOME ED Update – Week 12

New release!

In case you haven’t seen it yet, there’s a new GNOME release – 3.24! The release is the result of 6 months’ work by the GNOME community.

The new release is a major step forward for us, with new features and improvements, and some exciting developments in how we build applications. You can read more about it in the announcement and release notes.

As always, this release was made possible partially thanks to the Friends of GNOME project. In particular, it helped us provide a Core apps hackfest in Berlin last November, which had a direct impact on this release.


GTK+ hackfest

I’ve just come back from the GTK+ hackfest in London – thanks to RedHat and Endless for sponsoring the venues! It was great to meet a load of people who are involved with GNOME and GTK, and some great discussions were had about Flatpak and the creation of a “FlatHub” – somewhere that people can get all their latest Flatpaks from.


As I’m writing this, I’m sitting on a train going to Heathrow, for my flight to LibrePlanet 2017! If you’re going to be there, come and say hi. I’ve a load of new stickers that have been produced as well so these can brighten up your laptop.

Worse Than FailureMicro(managed)-services

Alan worked for Maria in the Books-and-Records department of a massive conglomerate. Her team was responsible for keeping all the historical customer transaction records on line and accessible for auditors and regulatory inquiries. There was a ginormous quantity of records of varying sizes in countless tables, going back decades.

Maria was constantly bombarded with performance issues caused by auditors issuing queries without PK fields, or even where-clauses. Naturally, these would bring the servers to their proverbial knees and essentially prevent anyone else from doing any work.

The Red Queen with Alice, from the original illustrations of 'Through the Looking Glass'

To solve this problem, Maria decided that all auditors and regulators would be locked out of the database for purposes of direct queries. Instead, they would be provided with an API that would allow them to mimic a where-clause. The underlying code would check to see if no PKs were specified, or if a where clause was missing altogether. If so, it would run the query at a much lower priority and the auditor issuing the offending query would wait while the servers did the massive scans in the background, so the other auditors could continue working with a reasonably responsive database.

So far, so good.

Alan wanted to build a mechanism to query the list of available tables, and the columns available in each. This could be provided via the API, which the auditors' developers could then programmatically use to create the objectified where-clause to submit as part of a query.

Maria would have nothing to do with that. Instead, she wanted to sit with each potential auditor and have them define every single query that they could possibly ever need (table(s), column(s), join(s), etc). Alan pointed out that the auditors could not possibly know this in advance until some issue arose and they had to find the data relevant to the issue. Since this would vary by issue, the queries would be different every time. As such, there was no way to hard-wire them into the API.

She put her foot down and demanded a specific list of queries since that was the only way to build an API.

Alan went to every auditor and asked for a list of all the queries they had issued in the past year. They grudgingly obliged.

Maria then went on to design each API function call with specific arguments required to execute the given underlying query. The results would then be returned in a dedicated POJO.

Again, Alan groaned that defining a POJO for each and every subset of columns was inappropriate; they should at least design the POJOs to handle the entire column set of the given table, and have getters that represented columns that were not requested as part of a given API query throw a column-not-queried exception. Maria said No and insisted on separate POJOs for each query.

Some time later, Alan had finished building the API. Once it was tested and deployed, the other development teams built relevant GUIs to use it and allow the auditors to pick the desired query and appropriate parameters to pass to it.

This worked well until an auditor needed to add a column to one of the queries. If Maria had let Alan use table-wide column pick-lists and POJOs that had all the fields of a table, this would have been easy. However, she didn't, and made him create another virtually identical API function, but with a parameter for the additional column.

Then it happened with another query. And another. And another.

After a while, there were so many versions of the API that the managers of the other teams blasted her choice of implementation (they had to deal with the different versions of the POJOs for each table in their code too) and demanded that it be made sane.

Finally, under pressure from above, Maria relented and instructed Alan to change the API to use the pick lists and POJOs he had originally wanted to provide.

To implement this required changing the signature of every method in the API. Fearing a riot from his counterparts, he got them all together and offered a two month window during which both old and new versions of the method calls would be supported. This would give their teams a chance to make the code changes without forcing them to drop their current priorities. The other developers and managers quickly agreed to the dual-mode window and thanked Alan.

Then a few of the other managers made the mistake of thanking Maria for the window in which to make changes.

She royally reamed Alan: "Did I tell you to give them a dual-mode window? Did I? You will immediately pull the old methods from the API and re-deploy. You will NOT email the other teams about this. Get it done; NOW!"

Alan had worked very hard to develop a good working relationship with his peers and their respective managers. Now he had been ordered to do something that was downright nasty and would absolutely destroy said relationships.

Alan changed the API, ran the tests, and entered the command to deploy it, but did not hit ENTER.

Then he quietly went around to each of the other managers, told them what he had been instructed to do and apologized for what was about to happen. He was somewhat taken aback when every single one of them told him not to worry; they had dealt with Maria before, that they appreciated his well-intentioned but ill-fated attempt to be a team player, and that they completely understood.

After that, he went back to his desk, hit ENTER, and contemplated asking the other managers if they could use a good developer.

[Advertisement] Manage IT infrastructure as code across all environments with Puppet. Puppet Enterprise now offers more control and insight, with role-based access control, activity logging and all-new Puppet Apps. Start your free trial today!

Planet DebianMike Hommey: Why is the git-cinnabar master branch slower to clone?

Apart from the memory considerations, one thing that the data presented in the “When the memory allocator works against you” post that I haven’t touched in the followup posts is that there is a large difference in the time it takes to clone mozilla-central with git-cinnabar 0.4.0 vs. the master branch.

One thing that was mentioned in the first followup is that reducing the amount of realloc and substring copies made the cloning more than 15 minutes faster on master. But the same code exists in 0.4.0, so this isn’t part of the difference.

So what’s going on? Looking at the CPU usage during the clone is enlightening.

On 0.4.0:

On master:

(Note: the data gathering is flawed in some ways, which explains why the git-remote-hg process goes above 100%, which is not possible for this python process. The data is however good enough for the high level analysis that follows, so I didn’t bother to get something more acurate)

On 0.4.0, the git-cinnabar-helper process was saturating one CPU core during the File import phase, and the git-remote-hg process was saturating one CPU core during the Manifest import phase. Overall, the sum of both processes usually used more than one and a half core.

On master, however, the total of both processes barely uses more than one CPU core.

What happened?

This and that happened.

Essentially, before those changes, git-remote-hg would send instructions to git-fast-import (technically, git-cinnabar-helper, but in this case it’s only used as a wrapper for git-fast-import), and use marks to track the git objects that git-fast-import created.

After those changes, git-remote-hg asks git-fast-import the git object SHA1 of objects it just asked to be created. In other words, those changes replaced something asynchronous with something synchronous: while it used to be possible for git-remote-hg to work on the next file/manifest/changeset while git-fast-import was working on the previous one, it now waits.

The changes helped simplify the python code, but made the overall clone process much slower.

If I’m not mistaken, the only real use for that information is for the mapping of mercurial to git SHA1s, which is actually rarely used during the clone, except at the end, when storing it. So what I’m planning to do is to move that mapping to the git-cinnabar-helper process, which, incidentally, will kill not 2, but 3 birds with 1 stone:

  • It will restore the asynchronicity, obviously (at least, that’s the expected main outcome).
  • Storing the mapping in the git-cinnabar-helper process is very likely to take less memory than what it currently takes in the git-remote-hg process. Even if it doesn’t (which I doubt), that should still help stay under the 2GB limit of 32-bit processes.
  • The whole thing that spikes memory usage during the finalization phase, as seen in previous post, will just go away, because the git-cinnabar-helper process will just have prepared the git notes-like tree on its own.

So expect git-cinnabar 0.5 to get moar faster, and to use moar less memory.

Planet DebianMike Hommey: Analyzing git-cinnabar memory use

In previous post, I was looking at the allocations git-cinnabar makes. While I had the data, I figured I’d also look how the memory use correlates with expectations based on repository data, to put things in perspective.

As a reminder, this is what the allocations look like (horizontal axis being the number of allocator function calls):

There are 7 different phases happening during a git clone using git-cinnabar, most of which can easily be identified on the graph above:

  • Negotiation.

    During this phase, git-cinnabar talks to the mercurial server to determine what needs to be pulled. Once that is done, a getbundle request is emitted, which response is read in the next three phases. This phase is essentially invisible on the graph.

  • Reading changeset data.

    The first thing that a mercurial server sends in the response for a getbundle request is changesets. They are sent in the RevChunk format. Translated to git, they become commit objects. But to create commit objects, we need the entire corresponding trees and files (blobs), which we don’t have yet. So we keep this data in memory.

    In the git clone analyzed here, there are 345643 changesets loaded in memory. Their raw size in RawChunk format is 237MB. I think by the end of this phase, we made 20 million allocator calls, have about 300MB of live data in about 840k allocations. (No certainty because I don’t actually have definite data that would allow to correlate between the phases and allocator calls, and the memory usage change between this phase and next is not as clear-cut as with other phases). This puts us at less than 3 live allocations per changeset, with “only” about 60MB overhead over the raw data.

  • Reading manifest data.

    In the stream we receive, manifests follow changesets. Each changeset points to one manifest ; several changesets can point to the same manifest. Manifests describe the content of the entire source code tree in a similar manner as git trees, except they are flat (there’s one manifest for the entire tree, where git trees would reference other git trees for sub directories). And like git trees, they only map file paths to file SHA1s. The way they are currently stored by git-cinnabar (which is planned to change) requires knowing the corresponding git SHA1s for those files, and we haven’t got those yet, so again, we keep everything in memory.

    In the git clone analyzed here, there are 345398 manifests loaded in memory. Their raw size in RawChunk format is 1.18GB. By the end of this phase, we made 23 million more allocator calls, and have about 1.52GB of live data in about 1.86M allocations. We’re still at less than 3 live allocations for each object (changeset or manifest) we’re keeping in memory, and barely over 100MB of overhead over the raw data, which, on average puts the overhead at 150 bytes per object.

    The three phases so far are relatively fast and account for a small part of the overall process, so they don’t appear clear-cut to each other, and don’t take much space on the graph.

  • Reading and Importing files.

    After the manifests, we finally get files data, grouped by path, such that we get all the file revisions of e.g. .cargo/.gitignore, followed by all the file revisions of .cargo/, .clang-format, and so on. The data here doesn’t depend on anything else, so we can finally directly import the data.

    This means that for each revision, we actually expand the RawChunk into the full file data (RawChunks contain patches against a previous revision), and don’t keep the RawChunk around. We also don’t keep the full data after it was sent to the git-cinnabar-helper process (as far as cloning is concerned, it’s essentially a wrapper for git-fast-import), except for the previous revision of the file, which is likely the patch base for the next revision.

    We however keep in memory one or two things for each file revision: a mapping of its mercurial SHA1 and the corresponding git SHA1 of the imported data, and, when there is one, the file metadata (containing information about file copy/renames) that lives as a header in the file data in mercurial, but can’t be stored in the corresponding git blobs, otherwise we’d have irrelevant data in checkouts.

    On the graph, this is where there is a steady and rather long increase of both live allocations and memory usage, in stairs for the latter.

    In the git clone analyzed here, there are 2.02M file revisions, 78k of which have copy/move metadata for a cumulated size of 8.5MB of metadata. The raw size of the file revisions in RawChunk format is 3.85GB. The expanded data size is 67GB. By the end of this phase, we made 622 million more allocator calls, and peaked at about 2.05GB of live data in about 6.9M allocations. Compared to the beginning of this phase, that added about 530MB in 5 million allocations.

    File metadata is stored in memory as python dicts, with 2 entries each, instead of raw form for convenience and future-proofing, so that would be at least 3 allocations each: one for each value, one for the dict, and maybe one for the dict storage ; their keys are all the same and are probably interned by python, so wouldn’t count.

    As mentioned above, we store a mapping of mercurial to git SHA1s, so for each file that makes 2 allocations, 4.04M total. Plus the 230k or 310k from metadata. Let’s say 4.45M total. We’re short 550k allocations, but considering the numbers involved, it would take less than one allocation per file on average to go over this count.

    As for memory size, per this answer on stackoverflow, python strings have an overhead of 37 bytes, so each SHA1 (kept in hex form) will take 77 bytes (Note, that’s partly why I didn’t particularly care about storing them as binary form, that would only save 25%, not 50%). That’s 311MB just for the SHA1s, to which the size of the mapping dict needs to be added. If it were a plain array of pointers to keys and values, it would take 2 * 8 bytes per file, or about 32MB. But that would be a hash table with no room for more items (By the way, I suspect the stairs that can be seen on the requested and in-use bytes is the hash table being realloc()ed). Plus at least 290 bytes per dict for each of the 78k metadata, which is an additional 22M. All in all, 530MB doesn’t seem too much of a stretch.

  • Importing manifests.

    At this point, we’re done receiving data from the server, so we begin by dropping objects related to the bundle we got from the server. On the graph, I assume this is the big dip that can be observed after the initial increase in memory use, bringing us down to 5.6 million allocations and 1.92GB.

    Now begins the most time consuming process, as far as mozilla-central is concerned: transforming the manifests into git trees, while also storing enough data to be able to reconstruct manifests later (which is required to be able to pull from the mercurial server after the clone).

    So for each manifest, we expand the RawChunk into the full manifest data, and generate new git trees from that. The latter is mostly performed by the git-cinnabar-helper process. Once we’re done pushing data about a manifest to that process, we drop the corresponding data, except when we know it will be required later as the delta base for a subsequent RevChunk (which can happen in bundle2).

    As with file revisions, for each manifest, we keep track of the mapping of SHA1s between mercurial and git. We also keep a DAG of the manifests history (contrary to git trees, mercurial manifests track their ancestry ; files do too, but git-cinnabar doesn’t actually keep track of that separately ; it just relies on the manifests data to infer file ancestry).

    On the graph, this is where the number of live allocations increases while both requested and in-use bytes decrease, noisily.

    By the end of this phase, we made about 1 billion more allocator calls. Requested allocations went down to 1.02GB, for close to 7 million live allocations. Compared to the end of the dip at the beginning of this phase, that added 1.4 million allocations, and released 900MB. By now, we expect everything from the “Reading manifests” phase to have been released, which means we allocated around 620MB (1.52GB – 900MB), for a total of 3.26M additional allocations (1.4M + 1.86M).

    We have a dict for the SHA1s mapping (345k * 77 * 2 for strings, plus the hash table with 345k items, so at least 60MB), and the DAG, which, now that I’m looking at memory usage, I figure has the one of the possibly worst structure, using 2 sets for each node (at least 232 bytes per set, that’s at least 160MB, plus 2 hash tables with 345k items). I think 250MB for those data structures would be largely underestimated. It’s not hard to imagine them taking 620MB, because really, that DAG implementation is awful. The number of allocations expected from them would be around 1.4M (4 * 345k), but I might be missing something. That’s way less than the actual number, so it would be interesting to take a closer look, but not before doing something about the DAG itself.

    Fun fact: the amount of data we’re dealing with in this phase (the expanded size of all the manifests) is close to 2.9TB (yes, terabytes). With about 4700 seconds spent on this phase on a real clone (less with the release branch), we’re still handling more than 615MB per second.

  • Importing changesets.

    This is where we finally create the git commits corresponding to the mercurial changesets. For each changeset, we expand its RawChunk, find the git tree we created in the previous phase that corresponds to the associated manifest, and create a git commit for that tree, with the right date, author, and commit message. For data that appears in the mercurial changeset that can’t be stored or doesn’t make sense to store in the git commit (e.g. the manifest SHA1, the list of changed files[*], or some extra metadata like the source of rebases), we keep some metadata we’ll store in git notes later on.

    [*] Fun fact: the list of changed files stored in mercurial changesets does not necessarily match the list of files in a `git diff` between the corresponding git commit and its parents, for essentially two reasons:

    • Old buggy versions of mercurial have generated erroneous lists that are now there forever (they are part of what makes the changeset SHA1).
    • Mercurial may create new revisions for files even when the file content is not modified, most notably during merges (but that also happened on non-merges due to, presumably, bugs).
    … so we keep it verbatim.

    On the graph, this is where both requested and in-use bytes are only slightly increasing.

    By the end of this phase, we made about half a billion more allocator calls. Requested allocations went up to 1.06GB, for close to 7.7 million live allocations. Compared to the end of the previous phase, that added 700k allocations, and 400MB. By now, we expect everything from the “Reading changesets” phase to have been released (at least the raw data we kept there), which means we may have allocated at most around 700MB (400MB + 300MB), for a total of 1.5M additional allocations (700k + 840k).

    All these are extra data we keep for the next and final phase. It’s hard to evaluate the exact size we’d expect here in memory, but if we divide by the number of changesets (345k), that’s less than 5 allocations per changeset and less than 2KB per changeset, which is low enough not to raise eyebrows, at least for now.

  • Finalizing the clone.

    The final phase is where we actually go ahead storing the mappings between mercurial and git SHA1s (all 2.7M of them), the git notes where we store the data necessary to recreate mercurial changesets from git commits, and a cache for mercurial tags.

    On the graph, this is where the requested and in-use bytes, as well as the number of live allocations peak like crazy (up to 21M allocations for 2.27GB requested).

    This is very much unwanted, but easily explained with the current state of the code. The way the mappings between mercurial and git SHA1s are stored is via a tree similar to how git notes are stored. So for each mercurial SHA1, we have a file that points to the corresponding git SHA1 through git links for commits or directly for blobs (look at the output of git ls-tree -r refs/cinnabar/metadata^3 if you’re curious about the details). If I remember correctly, it’s faster if the tree is created with an ordered list of paths, so the code created a list of paths, and then sorted it to send commands to create the tree. The former creates a new str of length 42 and a tuple of 3 elements for each and every one of the 2.7M mappings. With the 37 bytes overhead by str instance and the 56 + 3 * 8 bytes per tuple, we have at least 429MB wasted. Creating the tree itself keeps the corresponding fast-import commands in a buffer, where each command is going to be a tuple of 2 elements: a pointer to a method, and a str of length between 90 and 93. That’s at least another 440MB wasted.

    I already fixed the first half, but the second half still needs addressing.

Overall, except for the stupid spike during the final phase, the manifest DAG and the glibc allocator runaway memory use described in previous posts, there is nothing terribly bad with the git-cinnabar memory usage, all things considered. Mozilla-central is just big.

The spike is already half addressed, and work is under way for the glibc allocator runaway memory use. The manifest DAG, interestingly, is actually mostly useless. It’s only used to track the heads of the DAG, and it’s very much possible to track heads of a DAG without actually storing the entire DAG. In fact, that’s what git-cinnabar already does for changeset heads… so we would only need to do the same for manifest heads.

One could argue that the 1.4GB of raw RevChunk data we’re keeping in memory for later user could be kept on disk instead. I haven’t done this so far because I didn’t want to have to handle temporary files (and answer questions like “where to put them?”, “what if there isn’t enough disk space there?”, “what if disk access is slow?”, etc.). But the majority of this data is from manifests. I’m already planning changes in how git-cinnabar stores manifests data that will actually allow to import them directly, instead of keeping them in memory until files are imported. This would instantly remove 1.18GB of memory usage. The downside, however, is that this would be more CPU intensive: Importing changesets will require creating the corresponding git trees, and getting the stored manifest data. I think it’s worth, though.

Finally, one thing that isn’t obvious here, but that was found while analyzing why RSS would be going up despite memory usage going down, is that git-cinnabar is doing way too many reallocations and substring allocations.

So let’s look at two metrics that hopefully will highlight the problem:

  • The cumulated amount of requested memory. That is, the sum of all sizes ever given to malloc, realloc, calloc, etc.
  • The compensated cumulated amount of requested memory (naming is hard). That is, the sum of all sizes ever given to malloc, calloc, etc. except realloc. For realloc, we only count the delta in size between what the size was before and after the realloc.

Assuming all the requested memory is filled at some point, the former gives us an upper bound to the amount of memory that is ever filled or copied (the amount that would be filled if no realloc was ever in-place), while the the latter gives us a lower bound (the amount that would be filled or copied if all reallocs were in-place).

Ideally, we’d want the upper and lower bounds to be close to each other (indicating few realloc calls), and the total amount at the end of the process to be as close as possible to the amount of data we’re handling (which we’ve seen is around 3TB).

… and this is clearly bad. Like, really bad. But we already knew that from the previous post, although it’s nice to put numbers on it. The lower bound is about twice the amount of data we’re handling, and the upper bound is more than 10 times that amount. Clearly, we can do better.

We’ll see how things evolve after the necessary code changes happen. Stay tuned.

Planet Linux AustraliaOpenSTEM: Guess the Artefact!

Today we are announcing a new challenge for our readers – Guess the Artefact! We post pictures of an artefact and you can guess what it is. The text will slowly reveal the answer, through a process of examination and deduction – see if you can guess what it is, before the end. We are starting this challenge with an item from our year 6 Archaeological Dig workshop. Year 6 (unit 6.3) students concentrate on Federation in their Australian History segment – so that’s your first clue! Study the image and then start reading the text below.

OpenSTEM archaeological dig artefact (C) 2016 OpenSTEM Pty Ltd

Our first question is what is it? Study the image and see if you can work out what it might be – it’s an dirty, damaged piece of paper. It seems to be old. Does it have a date? Ah yes, there are 3 dates – 23, 24 and 25 October, 1889, so we deduce that it must be old, dating to the end of the 19th century. We will file the exact date for later consideration. We also note references to railways. The layout of the information suggests a train ticket. So we have a late 19th century train ticket!

Now why do we have this train ticket and whose train ticket might it have been? The ticket is First Class, so this is someone who could afford to travel in style. Where were they going? The railways mentioned are Queensland Railways, Great Northern Railway, New South Wales Railways and the stops are Brisbane, Wallangara, Tenterfield and Sydney. Now we need to do some research. Queensland Railways and New South Wales Railways seem self-evident, but what is Great Northern Railway? A brief hunt reveals several possible candidates: 1) a contemporary rail operator in Victoria; 2) a line in Queensland connecting Mt Isa and Townsville and 3) an old, now unused railway in New South Wales. We can reject option 1) immediately. Option 2) is the right state, but the towns seem unrelated. That leaves option 3), which seems most likely. Looking into the NSW option in more detail we note that it ran between Sydney and Brisbane, with a stop at Wallangara to change gauge – Bingo!

Wallangara Railway Station

More research reveals that the line reached Wallangara in 1888, the year before this ticket was issued. Only after 1888 was it possible to travel from Brisbane to Sydney by rail, albeit with a compulsory stop at Wallangara. We note also that the ticket contains a meal voucher for dinner at the Railway Refreshment Rooms in Wallangara. Presumably passengers overnighted in Wallangara before continuing on to Sydney on a different train and rail gauge. Checking the dates on the ticket, we can see evidence of an overnight stop, as the next leg continues from Wallangara on the next day (24 Oct 1889). However, next we come to some important information. From Wallangara, the next leg of the journey represented by this ticket was only as far as Tenterfield. Looking on a map, we note that Tenterfield is only about 25 km away – hardly a day’s train ride, more like an hour or two at the most (steam trains averaged about 24 km/hr at the time). From this we deduce that the ticket holder wanted to stop at Tenterfield and continue their journey on the next day.

We know that we’re studying Australian Federation history, so the name Tenterfield should start to a ring a bell – what happened in Tenterfield in 1889 that was relevant to Australian Federation history? The answer, of course, is that Henry Parkes delivered his Tenterfield Oration there, and the date? 24 October, 1889! If we look into the background, we quickly discover that Henry Parkes was on his way from Brisbane back to Sydney, when he stopped in Tenterfield. He had been seeking support for Federation from the government of the colony of Queensland. He broke his journey in Tenterfield, a town representative of those towns closer to the capital of another colony than their own, which would benefit from the free trade arrangements flowing from Federation. Parkes even discussed the issue of different rail gauges as something that would be solved by Federation! We can therefore surmise that this ticket may well be the ticket of Henry Parkes, documenting his journey from Brisbane to Sydney in October, 1889, during which he stopped and delivered the Tenterfield Oration!

This artefact is therefore relevant as a source for anyone studying Federation history – as well as giving us a more personal insight into the travels of Henry Parkes in 1889, it allows us to consider aspects of life at the time:

  • the building of railway connections across Australia, in a time before motor cars were in regular use;
  • the issue of different size railway gauges in the different colonies and what practical challenges that posed for a long distance rail network;
  • the ways in which people travelled and the speed with which they could cross large distances;
  • what rail connections would have meant for small, rural towns, to mention just a few.
  • Why might the railway companies have provided meal vouchers?

These are all sidelines of inquiry, which students may be interested to pursue, and which might help them to engage with the subject matter in more detail.

In our Archaeological Dig Workshops, we not only engage students in the processes and physical activities of the dig, but we provide opportunities for them to use the artefacts to practise deduction, reasoning and research – true inquiry-based learning, imitating real-world processes and far more engaging and empowering than more traditional bookwork.


Krebs on SecurityeBay Asks Users to Downgrade Security

Last week, KrebsOnSecurity received an email from eBay. The company wanted me to switch from using a hardware key fob when logging into eBay to receiving a one-time code sent via text message. I found it remarkable that eBay, which at one time was well ahead of most e-commerce companies in providing more robust online authentication options, is now essentially trying to downgrade my login experience to a less-secure option.

ebay2faIn early 2007, PayPal (then part of the same company as eBay) began offering its hardware token for a one-time $5 fee, and at the time the company was among very few that were pushing this second-factor (something you have) in addition to passwords for user authentication. In fact, I wrote about this development back when I was a reporter at The Washington Post:

“Armed with one of these keys, if you were to log on to your account from an unfamiliar computer and some invisible password stealing program were resident on the machine, the bad guys would still be required to know the numbers displayed on your token, which of course changes every 30 seconds. Likewise, if someone were to guess or otherwise finagle your PayPal password.”

The PayPal security key.

The PayPal security key.

I’ve still got the same hardware token I ordered when writing about that offering, and it’s been working well for the past decade. Now, eBay is asking me to switch from the key fob to text messages, the latter being a form of authentication that security experts say is less secure than other forms of two-factor authentication (2FA).

The move by eBay comes just months after the National Institute for Standards and Technology (NIST) released a draft of new authentication guidelines that appear to be phasing out the use of SMS-based two-factor authentication. NIST said one-time codes that are texted to users over a mobile phone are vulnerable to interception, noting that thieves can divert the target’s SMS messages and calls to another device (either by social engineering a customer service person at the phone company, or via more advanced attacks like SS7 hacks).

I asked eBay to explain their rationale for suggesting this switch. I received a response suggesting the change was more about bringing authentication in-house (the security key is made by Verisign) and that eBay hopes to offer additional multi-factor authentication options in the future.

“As a company, eBay is committed to providing a safe and secure marketplace for our millions of customers around the world,” eBay spokesman Ryan Moore wrote. “Our product team is constantly working on establishing new short-term and long-term, eBay-owned factors to address our customer’s security needs. To that end, we’ve launched SMS-based 2FA as a convenient 2FA option for eBay customers who already had hardware tokens issued through PayPal. eBay continues to work on advancing multi-factor authentication options for our users, with the end goal of making every solution more secure and more convenient. We look forward to sharing more as additional solutions are ready to launch.”

I think I’ll keep my key fob and continue using that for two-factor authentication on both PayPal and eBay, thank you very much. It’s not clear whether eBay is also phasing out the use of Symantec’s VIP Security Key App, which has long offered eBay and PayPal users alike more security than a texted one-time code. eBay did not respond to specific questions regarding this change.

Although SMS is not as secure as other forms of 2FA, it is probably better than nothing. Are you taking advantage of two-factor authentication wherever it is offered? The site maintains a fairly comprehensive list of companies that offer two-step or two-factor authentication.

Planet DebianArturo Borrero González: IPv6 and CGNAT


Today I ended reading an interesting article by the 4th spanish ISP regarding IPv6 and CGNAT. The article is in spanish, but I will translate the most important statements here.

Having a spanish Internet operator to talk about this subject is itself good news. We have been lacking any news regarding IPv6 in our country for years. I mean, no news from private operators. Public networks like the one where I develop my daily job has been offering native IPv6 since almost a decade…

The title of the article is “What is CGNAT and why is it used”.

They start by admiting that this technique is used to address the issue of IPv4 exhaustion. Good. They move on to say that IPv6 was designed to address IPv4 exhaustion. Great. Then, they state that ‘‘the internet network is not ready for IPv6 support’’. Also that ‘‘IPv6 has the handicap of many websites not supporting it’’. Sorry?

That is not true. If they refer to the core of internet (i.e, RIRs, interexchangers, root DNS servers, core BGP routers, etc) they have been working with IPv6 for ages now. If they refer to something else, for example Google, Wikipedia, Facebook, Twitter, Youtube, Netflix or any random hosting company, they do support IPv6 as well. Hosting companies which don’t support IPv6 are only a few, at least here in Europe.

The traffic to/from these services is clearly the vast majority of the traffic traveling in the wires nowaday. And they support IPv6.

The article continues defending CGNAT. They refer to IPv6 as an alternative to CGNAT. No, sorry, CGNAT is an alternative to you not doing your IPv6 homework.

The article ends by insinuing that CGNAT is more secure and useful than IPv6. That’s the final joke. They mention some absurd example of IP cams being accessed from the internet by anyone.

Sure, by using CGNAT you are indeed making the network practically one-way only. There exists RFC7021 which refers to the big issues of a CGNAT network. So, by using CGNAT you sacrifice a lot of usability in the name of security. This supposed security can be replicated by the most simple possible firewall, which could be deployed in Dual Stack IPv4/IPv6 using any modern firewalling system, like nftables.

(Here is a good blogpost of RFC7021 for spanish readers: Midiendo el impacto del Carrier-Grade NAT sobre las aplicaciones en red)

By the way, Google kindly provides some statistics regarding their IPv6 traffic. These stats clearly show an exponential growth:

Google IPv6 traffic

Others ISP operators are giving IPv6 strong precedence over IPv4, that’s the case of Verizon in USA: Verizon Static IP Changes IPv4 to Persistent Prefix IPv6.

My article seems a bit like a rant, but I couldn’t miss the oportunity to claim for native IPv6. None of the major spanish ISP have IPv6.

CryptogramNSA Best Scientific Cybersecurity Paper Competition

Every year, the NSA has a competition for the best cybersecurity paper. Winners get to go to the NSA to pick up the award. (Warning: you will almost certainly be fingerprinted while you're there.)

Submission guidelines and nomination page.

Planet DebianMichael Stapelberg: Debian stretch on the Raspberry Pi 3 (update)

I previously wrote about my Debian stretch preview image for the Raspberry Pi 3.

Now, I’m publishing an updated version, containing the following changes:

  • A new version of the upstream firmware makes the Ethernet MAC address persist across reboots.
  • Updated initramfs files (without updating the kernel) are now correctly copied to the VFAT boot partition.
  • The initramfs’s file system check now works as the required fsck binaries are now available.
  • The root file system is now resized to fill the available space of the SD card on first boot.
  • SSH access is now enabled, restricted via iptables to local network source addresses only.
  • The image uses the linux-image-4.9.0-2-arm64 4.9.13-1 kernel.

A couple of issues remain, notably the lack of HDMI, WiFi and bluetooth support (see wiki:RaspberryPi3 for details. Any help with fixing these issues is very welcome!

As a preview version (i.e. unofficial, unsupported, etc.) until all the necessary bits and pieces are in place to build images in a proper place in Debian, I built and uploaded the resulting image. Find it at To install the image, insert the SD card into your computer (I’m assuming it’s available as /dev/sdb) and copy the image onto it:

$ wget
$ sudo dd if=2017-03-22-raspberry-pi-3-stretch-PREVIEW.img of=/dev/sdb bs=5M

If resolving client-supplied DHCP hostnames works in your network, you should be able to log into the Raspberry Pi 3 using SSH after booting it:

$ ssh root@rpi3
# Password is “raspberry”

Planet DebianDirk Eddelbuettel: Suggests != Depends

A number of packages on CRAN use Suggests: casually.

They list other packages as "not required" in Suggests: -- as opposed to absolutely required via Imports: or the older Depends: -- yet do not test for their use in either examples or, more commonly, unit tests.

So e.g. the unit tests are bound to fail because, well, Suggests != Depends.

This has been accomodated for many years by all parties involved by treating Suggests as a Depends and installing unconditionally. As I understand it, CRAN appears to flip a switch to automatically install all Suggests from major repositories glossing over what I consider to be a packaging shortcoming. (As an aside, treatment of Additonal_repositories: is indeed optional; Brooke Anderson and I have a fine paper under review on this)

I spend a fair amount of time with reverse dependency ("revdep") checks of packages I maintain, and I will no longer accomodate these packages.

These revdep checks take long enough as it is, so I will now blacklist these packages that are guaranteed to fail when their "optional" dependencies are not present.

Writing R Extensions says in Section 1.1.3

All packages that are needed10 to successfully run R CMD check on the package must be listed in one of ‘Depends’ or ‘Suggests’ or ‘Imports’. Packages used to run examples or tests conditionally (e.g. via if(require(pkgname))) should be listed in ‘Suggests’ or ‘Enhances’. (This allows checkers to ensure that all the packages needed for a complete check are installed.)

In particular, packages providing “only” data for examples or vignettes should be listed in ‘Suggests’ rather than ‘Depends’ in order to make lean installations possible.


It used to be common practice to use require calls for packages listed in ‘Suggests’ in functions which used their functionality, but nowadays it is better to access such functionality via :: calls.

and continues in Section

Note that someone wanting to run the examples/tests/vignettes may not have a suggested package available (and it may not even be possible to install it for that platform). The recommendation used to be to make their use conditional via if(require("pkgname"))): this is fine if that conditioning is done in examples/tests/vignettes.

I will now exercise my option to use 'lean installations' as discussed here. If you want your package included in tests I run, please make sure it tests successfully when only its required packages are present.

Google AdsenseHelp our team help you

We always hear from publishers that you’re looking for actionable tips and best practices for growing your site. Our team of product, monetization and website optimization experts are constantly building new resources and insights to help publishers like you grow your business. We share personalized suggestions for your website via email in addition to news on the latest product features and invitations. Additionally, we send periodic invitations to live video sessions and events offering personalized tips for your region.

To ensure you’re getting this information from us, we need to have up-to-date contact details and communication preferences for you. Here are a few quick and simple steps you should take when you log in to your AdSense account, to make sure we can get in touch:

Step 1:
Make sure that the email address you've listed in your account to receive communications from us is correct and is one that you’re regularly checking. You can check which email address we’re using to reach you by logging in to your AdSense account and going to your Personal settings under the Account section on the left menu bar.

Step 2:
Opt in to receive emails from us. We categorize the emails we send based on the content they include. Here’s a quick breakdown:
  • Customized help and performance suggestions - Includes personalized revenue and optimization tips customized specific for your website. 
  • Periodic newsletters with tips and best practices - Includes general AdSense tips, best practices and product updates.
  • Occasional surveys to help Google improve your AdSense experience- Gives you a regular opportunity to share your feedback on AdSense.
  • Special Offers - Includes invitations to events in your country and live YouTube events..
  • Information about other Google products and services which may be of interest to you - Occasional updates on other Google products like Google Analytics or DoubleClick for Publishers.
You can update your email preferences by checking the boxes in your Personal settings section.

Step 3:
Choose your language preference. Did you know that AdSense emails are available in 38 different languages? We’ve recently added Hindi, Malay and Filipino. You can choose the language that suits you best by using the Display language drop-down menu directly under the email preferences checkboxes.

Get the most from the AdSense team and ensure you can hear from us. Log in to your account now and check your contact details and communication preferences. It takes no more than two minutes. 

Posted by Suzy Headon - Inside AdSense Team

Planet Linux AustraliaLinux Users of Victoria (LUV) Announce: LUV Beginners April Meeting: TBD

Apr 15 2017 12:30
Apr 15 2017 16:30
Apr 15 2017 12:30
Apr 15 2017 16:30
Infoxchange, 33 Elizabeth St. Richmond

Meeting topic to be announced.

There will also be the usual casual hands-on workshop, Linux installation, configuration and assistance and advice. Bring your laptop if you need help with a particular issue. This will now occur BEFORE the talks from 12:30 to 14:00. The talks will commence at 14:00 (2pm) so there is time for people to have lunch nearby.

The meeting will be held at Infoxchange, 33 Elizabeth St. Richmond 3121 (enter via the garage on Jonas St.) Late arrivals, please call (0421) 775 358 for access to the venue.

LUV would like to acknowledge Infoxchange for the venue.

Linux Users of Victoria Inc., is an incorporated association, registration number A0040056C.

April 15, 2017 - 12:30

Planet Linux AustraliaLinux Users of Victoria (LUV) Announce: LUV Main April 2017 Meeting: SageMath / Simultaneous multithreading

Apr 4 2017 18:30
Apr 4 2017 20:30
Apr 4 2017 18:30
Apr 4 2017 20:30
The Dan O'Connell Hotel, 225 Canning Street, Carlton VIC 3053


Tuesday, April 4, 2017
6:30 PM to 8:30 PM
The Dan O'Connell Hotel
225 Canning Street, Carlton VIC 3053


• Adetokunbo "Xero" Arogbonlo, SageMath
• Stewart Smith, Simultaneous multithreading

The Dan O'Connell Hotel, 225 Canning Street, Carlton VIC 3053

Food and drinks will be available on premises.

Before and/or after each meeting those who are interested are welcome to join other members for dinner.

Linux Users of Victoria Inc., is an incorporated association, registration number A0040056C.

April 4, 2017 - 18:30

read more

CryptogramNew Paper on Encryption Workarounds

I have written a paper with Orin Kerr on encryption workarounds. Our goal wasn't to make any policy recommendations. (That was a good thing, since we probably don't agree on any.) Our goal was to present a taxonomy of different workarounds, and discuss their technical and legal characteristics and complications.

Abstract: The widespread use of encryption has triggered a new step in many criminal investigations: the encryption workaround. We define an encryption workaround as any lawful government effort to reveal an unencrypted version of a target's data that has been concealed by encryption. This essay provides an overview of encryption workarounds. It begins with a taxonomy of the different ways investigators might try to bypass encryption schemes. We classify six kinds of workarounds: find the key, guess the key, compel the key, exploit a flaw in the encryption software, access plaintext while the device is in use, and locate another plaintext copy. For each approach, we consider the practical, technological, and legal hurdles raised by its use.

The remainder of the essay develops lessons about encryption workarounds and the broader public debate about encryption in criminal investigations. First, encryption workarounds are inherently probabilistic. None work every time, and none can be categorically ruled out every time. Second, the different resources required for different workarounds will have significant distributional effects on law enforcement. Some techniques are inexpensive and can be used often by many law enforcement agencies; some are sophisticated or expensive and likely to be used rarely and only by a few. Third, the scope of legal authority to compel third-party assistance will be a continuing challenge. And fourth, the law governing encryption workarounds remains uncertain and underdeveloped. Whether encryption will be a game-changer or a speed bump depends on both technological change and the resolution of important legal questions that currently remain unanswered.

The paper is finished, but we'll be revising it once more before final publication. Comments are appreciated.

Sociological ImagesAbout a Boy–On the Sociological Relevance of Calvin (and Hobbes)

Originally posted at Inequality by (Interior) Design.

One of my favorite sociologists is Bill Watterson. He’s not read in most sociology classrooms, but he has a sociological eye and a great talent for laying bare the structure of the world around us and the ways that we as individuals must navigate that structure—some with fewer obstacles than others. Unlike most sociologists, Watterson does this without inventing new jargon (or much new jargon), or relying on overly dense theoretical claims. He doesn’t call our attention to demographic trends (often) or seek to find and explain low p values.

Rather, Watterson presents the world from the perspective of a young boy who is both tremendously influenced by–and desires to have a tremendous influence on–the world around him. The boy’s name is Calvin, and I put a picture of him (often in the company of his stuffed tiger, Hobbes) on almost every syllabus I write. Watterson is the artist behind the iconic comic strip, “Calvin and Hobbes,” and he firmly believed in his art form. I’m convinced that if you can’t find a Calvin and Hobbes cartoon to put on your syllabus for a sociology course, there’s a good chance you’re not teaching sociology.

The questions and perspectives of children are significant to sociologists because children offer us an amazing presentation of how much is learned, and how we come to take what we’ve learned for granted. In many ways, this is at the heart of the ethnographic project: to uncover both what is taken for granted and why this might matter. Using the charm and wit of a megalomaniacal young boy, Watterson challenges us on issues of gender inequality, sexual socialization, religious identity and ideology, racism, classism, ageism, deviance, the logic of capitalism, globalization, education, academic inquiry, philosophy, postmodernism, family forms and functions, the social construction of childhood, environmentalism, and more.

Watterson depicts the world from Calvin’s perspective. He manages to illustrate both how odd this perspective appears to others around him (his parents, teachers, peers, even Hobbes) as well as the tenacity with which Calvin clings to his unique view of the world despite the fact that it often fails to accord with reality. Indeed, Calvin ritualistically comes into conflict with his social obligations as a child (school, chores, social etiquette, and norms of deference and respect, etc.) and the diverse roles he plays as a social actor (both real and imaginary). Calvin is a wonderful example of the human capacity to play with social “roles” and within the social institutions that frame and structure our lives.

Quite simply, Calvin often simply refuses to play the social roles assigned to him, or, somewhat more mildly, he refuses to play those roles in precisely the way they were designed to be played. And in that way, Calvin helps to illustrate just how social our behavior is.  Social behavior is based on a series of structured negotiations with the world around us. This doesn’t have to mean that we can act however we please—Calvin is continually bumping into social sanctions for his antics. But neither does it mean that we only act in ways that were structurally predetermined. The world around us is a collective project, one in which we have a stake. We play a role in both social reproduction and change.

Understanding the ways in which our experiences, identities, opportunities, and more are structured by the world around us is a central feature of sociological learning. Calvin is one way I ask students to consider these ideas. Kai Erikson put it this way in an essay on sociological writing:

Most sociologists think of their discipline as an approach as well as a subject matter, a perspective as well as a body of knowledge. What distinguishes us from other observers of the human scene is the manner in which we look out at the world—the way our eyes are focused, the way our minds are tuned, the way our intellectual reflexes are set. Sociologists peer out at the same landscapes as historians or poets or philosophers, but we select different details out of those scenes to attend closely, and we sort those details in different ways. So it is not only what sociologists see but the way they look that gives the field its special distinction. (here)

Calvin is a great example of the significance of “breaching” social norms. Breaches tell us something important about what we take for granted, and if your sociological imagination is well-oiled, you can often learn something about the “how” and the “why” as well. A great deal of the social organization goes into the production of our experiences, identities, and opportunities is subtly disguised by these “whys” (or what are sometimes called “accounts”). Calvin’s incessant questioning of authority and social norms illuminates the social forces that guide our accounts surrounding a great deal of social life–subtly, but unmistakably, asking us to consider what we taken for granted, how we manage to do so, as well as why. This is a feature of some of the best sociological work—a feature that is dramatized in childhood.

In Erikson’s treatise on sociological writing, he concludes with a wonderful description of an interaction between Mark Twain and a “wily old riverboat pilot.” Researching life on the Mississippi, Twain noticed that the riverboat pilot deftly swerves and changes course down the river, dodging unseen objects below the water’s surface in an attempt to move smoothly down the river. Twain asks the pilot what he’s noticing on the water’s surface to make these decisions and adjustments. The riverboat pilot is unable to explain, offering a sort of “I know it when I see it” explanation (interviewers know this explanation well). The pilot’s eyes had become so skilled in this navigation that he didn’t need to concern himself with how he knew what he knew. But both he and Twain were confident that he knew it. Over the course of their interactions, Twain gradually comes to learn more about what exactly the riverboat pilot is able to see and how he uses it to move through the water unhindered. This, explains Erikson, is the project of good sociology—“to combine the eyes of a river pilot with the voice of Mark Twain.”

Through Calvin, Watterson accomplishes just this. Calvin offers us a glimpse of wonderful array of sociological ideas and perspectives in an accessible way. Watterson has a way of seamlessly calling our attention to the taken for granted throughout social life and his images and ideas are a great introduction to sociological thinking. I like to think that Calvin’s life, perspectives, antics, and waywardness help students call the systems of social inequality and the world around them into question, learning to see sociologically. Calvin is a great tool to help students recognize that they can question the unquestionable, to learn to problematize issues that might lack the formal status of “problems” in the first place. Watterson used Calvin to help all of us learn to see the ordinary as extraordinary–a worthy task for any sociology course.

Tristan Bridges, PhD is a professor at The College at Brockport, SUNY. He is the co-editor of Exploring Masculinities: Identity, Inequality, Inequality, and Change with C.J. Pascoe and studies gender and sexual identity and inequality. You can follow him on Twitter here. Tristan also blogs regularly at Inequality by (Interior) Design.

(View original at

Worse Than FailureCodeSOD: Dictionary Definition of a Loop

Ah, the grand old Dictionary/Map structure. It’s so useful, languages like Python secretly implement most of their objects using it, and JavaScript objects imitate it. One of its powers is that it allows you to create a sparse array, indexed by any data type you want to index it by.

Catherine’s cow-orker certainly thought this was pretty great, so they went ahead on and used the Dictionary to map interest rates to years. Years, for this application, were not tracked as actual years, but relative to an agreed upon “year zero”- the first year of the company’s operation. There was a new annual interest rate tracked for each year since.

If you’re saying to yourself, “wait… this sounds like a use case for arrays…”, you’re onto something. Pity you didn’t work with Catherine. You probably wouldn’t have left this behind:

private static double FindInterestRate(int operationYear, Dictionary<int, double> yearToInterestRates) //where 0 is the first year
    if (operationYear < 0)
        return 0;
        for(int i = 1; i < yearToInterestRates.Count; i++)
            if (operationYear < yearToInterestRates.ElementAt(i).Key - 1)
                return yearToInterestRates.ElementAt(i - 1).Value;
        return yearToInterestRates.Last().Value;

Now, even if you don’t know C#, this is obviously pretty bad, but it’s actually worse than you think. Let’s talk for a minute about the ElementAt method. Accessing a key in a dictionary is an O(1) operation, but that’s not what ElementAt does. ElementAt finds elements by indexes, essentially treating this Dictionary like an array. And how does ElementAt actually find elements in a non-linear structure? By iterating, meaning ElementAt is an O(n) operation, making this loop O(n2).

Remember, our goal, is to find a specific index in an array. Compare the efficiency.

[Advertisement] Universal Package Manager – store all your Maven, NuGet, Chocolatey, npm, Bower, TFS, TeamCity, Jenkins packages in one central location. Learn more today!

Planet DebianMike Hommey: When the memory allocator works against you, part 2

This is a followup to the “When the memory allocator works against you” post from a few days ago. You may want to read that one first if you haven’t, and come back. In case you don’t or didn’t read it, it was all about memory consumption during a git clone of the mozilla-central mercurial repository using git-cinnabar, and how the glibc memory allocator is using more than one would expect. This post is going to explore how/why it’s happening.

I happen to have written a basic memory allocation logger for Firefox, so I used it to log all the allocations happening during a git clone exhibiting the runaway memory increase behavior (using a python that doesn’t use its own allocator for small allocations).

The result was a 6.5 GB log file (compressed with zstd ; 125 GB uncompressed!) with 2.7 billion calls to malloc, calloc, free, and realloc, recorded across (mostly) 2 processes (the python git-remote-hg process and the native git-cinnabar-helper process ; there are other short-lived processes involved, but they do less than 5000 calls in total).

The vast majority of those 2.7 billion calls is done by the python git-remote-hg process: 2.34 billion calls. We’ll only focus on this process.

Replaying those 2.34 billion calls with a program that reads the log allowed to reproduce the runaway memory increase behavior to some extent. I went an extra mile and modified glibc’s realloc code in memory so it doesn’t call memcpy, to make things faster. I also ran under setarch x86_64 -R to disable ASLR for reproducible results (two consecutive runs return the exact same numbers, which doesn’t happen with ASLR enabled).

I also modified the program to report the number of live allocations (allocations that haven’t been freed yet), and the cumulated size of the actually requested allocations (that is, the sum of all the sizes given to malloc, calloc, and realloc calls for live allocations, as opposed to what the memory allocator really allocated, which can be more, per malloc_usable_size).

RSS was not tracked because the allocations are never filled to make things faster, such that pages for large allocations are never dirty, and RSS doesn’t grow as much because of that.

Full disclosure: it turns out the “system bytes” and “in-use bytes” numbers I had been collecting in the previous post were smaller than what they should have been, and were excluding memory that the glibc memory allocator would have mmap()ed. That however doesn’t affect the trends that had been witnessed. The data below is corrected.

(Note that in the graph above and the graphs that follow, the horizontal axis represents the number of allocator function calls performed)

While I was here, I figured I’d check how mozjemalloc performs, and it has a better behavior (although it has more overhead).

What doesn’t appear on this graph, though, is that mozjemalloc also tells the OS to drop some pages even if it keeps them mapped (madvise(MADV_DONTNEED)), so in practice, it is possible the actual RSS decreases too.

And jemalloc 4.5:

(It looks like it has better memory usage than mozjemalloc for this use case, but its stats are being thrown off at some point, I’ll have to investigate)

Going back to the first graph, let’s get a closer look at what the allocations look like when the “system bytes” number is increasing a lot. The highlights in the following graphs indicate the range the next graph will be showing.

So what we have here is a bunch of small allocations (small enough that they don’t seem to move the “requested” line ; most under 512 bytes, so under normal circumstances, they would be allocated by python, a few between 512 and 2048 bytes), and a few large allocations, one of which triggers a bump in memory use.

What can appear weird at first glance is that we have a large allocation not requiring more system memory, later followed by a smaller one that does. What the allocations actually look like is the following:

void *ptr0 = malloc(4850928); // #1391340418
void *ptr1 = realloc(some_old_ptr, 8000835); // #1391340419
free(ptr0); // #1391340420
ptr1 = realloc(ptr1, 8000925); // #1391340421
/* ... */
void *ptrn = malloc(879931); // #1391340465
ptr1 = realloc(ptr1, 8880819); // #1391340466
free(ptrn); // #1391340467

As it turns out, inspecting all the live allocations at that point, while there was a hole large enough to do the first two reallocs (the second actually happens in place), at the point of the third one, there wasn’t a large enough hole to fit 8.8MB.

What inspecting the live allocations also reveals, is that there is a large number of large holes between all the allocated memory ranges, presumably coming from previous similar patterns. There are, in fact, 91 holes larger than 1MB, 24 of which are larger than 8MB. It’s the accumulation of those holes that can’t be used to fulfil larger allocations that makes the memory use explode. And there aren’t enough small allocations happening to fill those holes. In fact, the global trend is for less and less memory to be allocated, so, smaller holes are also being poked all the time.

Really, it’s all a straightforward case of memory fragmentation. The reason it tends not to happen with jemalloc is that jemalloc groups allocations by sizes, which the glibc allocator doesn’t seem to be doing. The following is how we got a hole that couldn’t fit the 8.8MB allocation in the first place:

ptr1 = realloc(ptr1, 8880467); // #1391324989; ptr1 is 0x5555de145600
/* ... */
void *ptrx = malloc(232); // #1391325001; ptrx is 0x5555de9bd760 ; that is 13 bytes after the end of ptr1.
/* ... */
free(ptr1); // #1391325728; this leaves a hole of 8880480 bytes at 0x5555de145600.

All would go well if ptrx was free()d, but it looks like it’s long-lived. At least, it’s still allocated by the time we reach the allocator call #1391340466. And since the hole is 339 bytes too short for the requested allocation, the allocator has no other choice than request more memory to the system.

What’s bothering, though, is that the allocator chose to allocate ptrx in the space following ptr1, when it allocated similarly sized buffers after allocating ptr1 and before allocating ptrx in completely different places, and while there are plenty of holes in the allocated memory where it could fit.

Interestingly enough, ptrx is a 232 bytes allocation, which means under normal circumstances, python itself would be allocating it. In all likeliness, when the python allocator is enabled, it’s allocations larger than 512 bytes that become obstacles to the larger allocations. Another possibility is that the 256KB fragments that the python allocator itself allocates to hold its own allocations become the obstacles (my original hypothesis). I think the former is more likely, though, putting back the blame entirely on glibc’s shoulders.

Now, it looks like the allocation pattern we have here is suboptimal, so I re-ran a git clone under a debugger to catch when a realloc() for 8880819 bytes happens (the size is peculiar enough that it only happened once in the allocation log). But doing that with a conditional breakpoint is just too slow, so I injected a realloc wrapper with LD_PRELOAD that sends a SIGTRAP signal to the process, so that an attached debugger can catch it.

Thanks to the support for python in gdb, it was then posible to pinpoint the exact python instructions that made the realloc() call (it didn’t come as a surprise ; in fact, that was one of the places I had in mind, but I wanted definite proof):

new = ''
end = 0
# ...
for diff in RevDiff(rev_patch):
    new += data[end:diff.start]
    new += diff.text_data
    end = diff.end
    # ...
new += data[end:]

What happens here is that we’re creating a mercurial manifest we got from the server in patch form against a previous manifest. So data contains the original manifest, and rev_patch the patch. The patch essentially contains instructions of the form “replace n bytes at offset o with the content c“.

The code here just does that in the most straightforward way, implementation-wise, but also, it turns out, possibly the worst way.

So let’s unroll this loop over a couple iterations:

new = ''

This allocates an empty str object. In fact, this doesn’t actually allocate anything, and only creates a pointer to an interned representation of an empty string.

new += data[0:diff.start]

This is where things start to get awful. data is a str, so data[0:diff.start] creates a new, separate, str for the substring. One allocation, one copy.

Then appends it to new. Fortunately, CPython is smart enough, and just assigns data[0:diff.start] to new. This can easily be verified with the CPython REPL:

>>> foo = ''
>>> bar = 'bar'
>>> foo += bar
>>> foo is bar

(and this is not happening because the example string is small here ; it also happens with larger strings, like 'bar' * 42000)

Back to our loop:

new += diff.text_data

Now, new is realloc()ated to have the right size to fit the appended text in it, and the contents of diff.text_data is copied. One realloc, one copy.

Let’s go for a second iteration:

new += data[diff.end:new_diff.start]

Here again, we’re doing an allocation for the substring, and one copy. Then new is realloc()ated again to append the substring, which is an additional copy.

new += new_diff.text_data

new is realloc()ated yet again to append the contents of new_diff.text_data.

We now finish with:

new += data[new_diff.end:]

which, again creates a substring from the data, and then proceeds to realloc()ate new one freaking more time.

That’s a lot of malloc()s and realloc()s to be doing…

  • It is possible to limit the number of realloc()s by using new = bytearray() instead of new = ''. I haven’t looked in the CPython code what the growth strategy is, but, for example, appending a 4KB string to a 500KB bytearray makes it grow to 600KB instead of 504KB, like what happens when using str.
  • It is possible to avoid realloc()s completely by preallocating the right size for the bytearray (with bytearray(size)), but that requires looping over the patch once first to know the new size, or using an estimate (the new manifest can’t be larger than the size of the previous manifest + the size of the patch) and truncating later (although I’m not sure it’s possible to truncate a bytearray without a realloc()). As a downside, this initializes the buffer with null bytes, which is a waste of time.
  • Another possibility is to reuse bytearrays previously allocated for previous manifests.
  • Yet another possibility is to accumulate the strings to append and use ''.join(). CPython is smart enough to create a single allocation for the total size in that case. That would be the most convenient solution, but see below.
  • It is possible to avoid the intermediate allocations and copies for substrings from the original manifest by using memoryview.
  • Unfortunately, you can’t use ''.join() on a list of memoryviews before Python 3.4.

After modifying the code to implement the first and fifth items, memory usage during a git clone of mozilla-central looks like the following (with the python allocator enabled):

(Note this hasn’t actually landed on the master branch yet)

Compared to what it looked like before, this is largely better. But that’s not the only difference: the clone was also about 1000 seconds faster. That’s more than 15 minutes! But that’s not all so surprising when you know the volumes of data handled here. More insight about this coming in an upcoming post.

But while the changes outlined above make the glibc allocator behavior less likely to happen, it doesn’t totally obliviate it. In fact, it seems it is still happening by the end of the manifest import phase. We’re still allocating increasingly large temporary buffers because the size of the imported manifests grows larger and larger, and every one of them is the result of patching a previous one.

The only way to avoid those large allocations creating holes would be to avoid doing them in the first place. My first attempt at doing that, keeping manifests as lists of lines instead of raw buffers, worked, but was terribly slow. So slow, in fact, that I had to stop a clone early and estimated the process would likely have taken a couple days. Iterating over multiple generators at the same time, a lot, kills performance, apparently. I’ll have to try with significantly less of that.

Planet DebianElena 'valhalla' Grandi: XMPP VirtualHosts, SRV records and letsencrypt certificates

XMPP VirtualHosts, SRV records and letsencrypt certificates

When I set up my XMPP server, a friend of mine asked if I was willing to have a virtualhost with his domain on my server, using the same address as the email.

Setting up prosody and the SRV record on the DNS was quite easy, but then we stumbled on the issue of certificates: of course we would like to use letsencrypt, but as far as we know that means that we would have to setup something custom so that the certificate gets renewed on his server and then sent to mine, and that looks more of a hassle than just him setting up his own prosody/ejabberd on his server.

So I was wondering: dear lazyweb, did any of you have the same issue and already came up with a solution that is easy to implement and trivial to maintain that we missed?

Planet Linux AustraliaGabriel Noronha: Flir ONE Issues

FLIR ONE for iOS or Android with solid orange power light

Troubleshooting steps when the FLIR ONE has a solid red/orange power light that will not turn to blinking green:

  • Perform a hard reset on the FLIR ONE by holding the power button down for 30 seconds.
  • Let the battery drain overnight and try charging it again (with another charger if possible) for a whole hour.

Planet DebianClint Adams: Then Moises claimed that T.G.I. Friday's was acceptable

“Itʼs really sad listening to a friend talk about how he doesnʼt care for his wife and doesnʼt find her attractive anymore,” he whined, “while at the same time talking about the kid she is pregnant with—obviously they havenʼt had sex in awhile—and how though he only wants one kid, she wants multiple so they will probably have more. He said he couldnʼt afford to have a divorce. He literally said that one morning, watching her get dressed he laughed and told her, ‘Your boobs look weird.’ She didnʼt like that. I reminded him that they will continue to age. That didnʼt make him feel good. He said that he realized before getting married that he thought he was a good person, but now heʼs realizing heʼs a bad person. He said he was a misogynist. I said, ‘Worse, youʼre the type of misogynist who pretends to be a feminist.’ He agreed. He lived in Park Slope, but he moved once they became pregnant.”

“Good luck finding a kid-friendly restaurant,” she said.

Posted on 2017-03-22
Tags: umismu

Planet DebianDirk Eddelbuettel: anytime 0.2.2

A bugfix release of the anytime package arrived at CRAN earlier today. This is tenth release since the inaugural version late last summer, and the second (bugfix / feature) release this year.

anytime is a very focused package aiming to do just one thing really well: to convert anything in integer, numeric, character, factor, ordered, ... format to either POSIXct or Date objects -- and to do so without requiring a format string. See the anytime page, or the GitHub for a few examples.

This releases addresses an annoying bug related to British TZ settings and the particular impact of a change in 1971, and generalizes input formats to accept integer or numeric format in two specific ranges. Details follow below:

Changes in anytime version 0.2.2 (2017-03-21)

  • Address corner case of integer-typed (large) values corresponding to POSIXct time (PR #57 closing ##56)

  • Add special case for ‘Europe/London’ and 31 Oct 1971 BST change to avoid a one-hour offset error (#58 fixing #36 and #51)

  • Address another corner case of numeric values corresponding to Date types which are now returned as Date

  • Added file init.c with calls to R_registerRoutines() and R_useDynamicSymbols(); already used .registration=TRUE in useDynLib in NAMESPACE

Courtesy of CRANberries, there is a comparison to the previous release. More information is on the anytime page.

For questions or comments use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.


Planet DebianSteinar H. Gunderson: 10-bit H.264 support

Following my previous tests about 10-bit H.264, I did some more practical tests; since is up again, I did some tests with actual 10-bit input. The results were pretty similar, although of course 4K 60 fps organic content is going to be different at times from the partially rendered 1080p 24 fps clip I used.

But I also tested browser support, with good help from people on IRC. It was every bit as bad as I feared: Chrome on desktop (Windows, Linux, macOS) supports 10-bit H.264, although of course without hardware acceleration. Chrome on Android does not. Firefox does not (it tries on macOS, but plays back buggy). iOS does not. VLC does; I didn't try a lot of media players, but obviously ffmpeg-based players should do quite well. I haven't tried Chromecast, but I doubt it works.

So I guess that yes, it really is 8-bit H.264 or 10-bit HEVC—but I haven't tested the latter yet either :-)

Planet DebianMatthew Garrett: Announcing the Shim review process

Shim has been hugely successful, to the point of being used by the majority of significant Linux distributions and many other third party products (even, apparently, Solaris). The aim was to ensure that it would remain possible to install free operating systems on UEFI Secure Boot platforms while still allowing machine owners to replace their bootloaders and kernels, and it's achieved this goal.

However, a legitimate criticism has been that there's very little transparency in Microsoft's signing process. Some people have waited for significant periods of time before being receiving a response. A large part of this is simply that demand has been greater than expected, and Microsoft aren't in the best position to review code that they didn't write in the first place.

To that end, we're adopting a new model. A mailing list has been created at, and members of this list will review submissions and provide a recommendation to Microsoft on whether these should be signed or not. The current set of expectations around binaries to be signed documented here and the current process here - it is expected that this will evolve slightly as we get used to the process, and we'll provide a more formal set of documentation once things have settled down.

This is a new initiative and one that will probably take a little while to get working smoothly, but we hope it'll make it much easier to get signed releases of Shim out without compromising security in the process.

comment count unavailable comments

Krebs on SecurityStudent Aid Tool Held Key for Tax Fraudsters

Citing concerns over criminal activity and fraud, the U.S. Internal Revenue Service (IRS) has disabled an automated tool on its Web site that was used to help students and their families apply for federal financial aid. The removal of the tool has created unexpected hurdles for many families hoping to qualify for financial aid, but the action also eliminated a key source of data that fraudsters could use to conduct tax refund fraud.

Last week, the IRS and the Department of Education said in a joint statement that they were temporarily shutting down the IRS’s Data Retrieval Tool. The service was designed to make it easier to complete the Education Department’s Free Application for Federal Student Aid (FAFSA) — a lengthy form that serves as the starting point for students seeking federal financial assistance to pay for college or career school.

The U.S. Department of Education's FAFSA federal student aid portal. A notice about the closure of the IRS's data retrieval tool can be seen in red at the bottom right of this image.

The U.S. Department of Education’s FAFSA federal student aid portal. A notice about the closure of the IRS’s data retrieval tool can be seen in red at the bottom right of this image.

In response to requests for comment, the IRS shared the following statement: “As part of a wider, ongoing effort at the IRS to protect the security of data, the IRS decided to temporarily suspend their Data Retrieval Tool (DRT) as a precautionary step following concerns that information from the tool could potentially be misused by identity thieves.”

“The scope of the issue is being explored, and the IRS and FSA are jointly investigating the issue,” the statement continued. “At this point, we believe the issue is relatively isolated, and no additional action is needed by taxpayers or people using these applications. The IRS and FSA are actively working on a way to further strengthen the security of information provided by the DRT. We will provide additional information when we have a specific timeframe for returning the DRT or other details to share.”

The removal of the IRS’s tool received relatively broad media coverage last week. For example, a story in The Wall Street Journal notes that the Treasury Inspector General for Tax Administration — which provides independent oversight of the IRS — “opened a criminal investigation into the potentially fraudulent use of the tool.”

Nevertheless, I could not find a single publication that sought to explain precisely what information identity thieves were seeking from this now-defunct online resource. Two sources familiar with the matter but who asked to remain anonymous because they were not authorized to speak on the record told KrebsOnSecurity that identity thieves were using the IRS’s tool to look up the “adjusted gross income” (AGI), which is an individual or family’s total gross income minus specific deductions.

Anyone completing a FAFSA application will need to enter the AGI as reported on the previous year’s income tax return of their parents or guardians. The AGI is listed on the IRS-1040 forms that taxpayers must file with the IRS each year. The IRS’s online tool was intended as a resource for students who needed to look up the AGI but didn’t have access to their parents’ tax returns.

Eligible FAFSA applicants could use the IRS’s data retrieval tool to populate relevant fields in the application with data pulled directly from the IRS. Countless college Web sites explain how the tool works in more detail; here’s one example (PDF).

As it happens, the AGI is also required to sign and validate electronic tax returns filed with the IRS. Consequently, the IRS’s data retrieval tool would be a terrific resource to help identity thieves successfully file fraudulent tax refund requests with the agency.

A notice from the IRS states that the adjusted gross income (AGI) is needed to validate electronically-filed tax returns.

A notice from the IRS states that the adjusted gross income (AGI) is needed to validate electronically-filed tax returns.

Tax-related identity theft occurs when someone uses a Social Security number (SSN) — either a client’s, a spouse’s, or dependent’s — to file a tax return claiming a fraudulent refund. Thieves may also use a stolen Employer Identification Number (EIN) from a business client to create false Forms W-2 to support refund fraud schemes. Increasingly, fraudsters are simply phishing W-2 data in large quantities from human resource professionals at a variety of organizations. However, taxpayer AGI information is not listed on W-2 forms.

Victims usually first learn of the crime after having their returns rejected because scammers beat them to it. Even those who are not required to file a return can be victims of refund fraud, as can those who are not actually due a refund from the IRS.

This would not be the first time tax refund fraudsters abused an online tool made available by the IRS. During the height of tax-filing season in 2015, identity thieves used the’s “Get Transcript” feature to glean salary and personal information they didn’t already have on targeted taxpayers. In May 2015, the IRS suspended the Get Transcript feature, citing its abuse by fraudsters and noting that some 100,000 taxpayers may have been victimized as a result.

In August 2015, the agency revised those estimates up to 330,000, but in February 2016, the IRS again more than doubled its estimate, saying the number of taxpayers targeted via abuse of the Get Transcript tool was probably closer to 724,000.

The IRS re-enabled its Get Transcript service last summer, saying it had fortified the system with additional security safeguards — such as requiring visitors to supply a mobile phone number that is tied to the applicant’s name.

Now, the IRS is touting its new and improved Get Transcript service as an alternative method for obtaining the information needed to complete the FAFSA.

“If you did not retain a copy of your tax return, you may be able to access the tax software you used to prepare your return or contact your tax preparer to obtain a copy,” the IRS said in its advisory on the shutdown of its data retrieval tool. “You must verify your identity to use this tool. You also may use Get Transcript by Mail or call 1-800-908-9946, and a transcript will be delivered to your address of record within five to 10 days.”

The IRS advises those who still need help completing the FAFSA to visit or call 1-800-4FED-AID (1-800-433-3243).


Here are some steps you can take to make it less likely that you will be the next victim of tax refund fraud:

-File before the fraudsters do it for you – Your primary defense against becoming the next victim is to file your taxes at the state and federal level as quickly as possible. Remember, it doesn’t matter whether or not the IRS owes you money: Thieves can still try to impersonate you and claim that they do, leaving you to sort out the mess with the IRS later.

-Get on a schedule to request a free copy of your credit report. By law, consumers are entitled to a free copy of their report from each of the major bureaus once a year. Put it on your calendar to request a copy of your file every three to four months, each time from a different credit bureau. Dispute any unauthorized or suspicious activity. This is where credit monitoring services are useful: Part of their service is to help you sort this out with the credit bureaus, so if you’re signed up for credit monitoring make them do the hard work for you.

-File form 14039 and request an IP PIN from the government. This form requires consumers to state they believe they’re likely to be victims of identity fraud. Even if thieves haven’t tried to file your taxes for you yet, virtually all Americans have been touched by incidents that could lead to ID theft — even if we just look at breaches announced in the past year alone.

Consider placing a “security freeze” on one’s credit files with the major credit bureaus. See this tutorial about why a security freeze — also known as a “credit freeze,” may be more effective than credit monitoring in blocking ID thieves from assuming your identity to open up new lines of credit. While it’s true that having a security freeze on your credit file won’t stop thieves from committing tax refund fraud in your name, it would stop them from fraudulently obtaining your IP PIN.

Monitor, then freeze. Take advantage of any free credit monitoring available to you, and then freeze your credit file with the four major bureaus. Instructions for doing that are here.

Planet DebianReproducible builds folks: Reproducible Builds: week 99 in Stretch cycle

Here's what happened in the Reproducible Builds effort between Sunday March 12 and Saturday March 18 2017:

Upcoming events

Reproducible Builds Hackathon Hamburg 2017

The Reproducible Builds Hamburg Hackathon 2017, or RB-HH-2017 for short is a 3 day hacking event taking place May 5th-7th in the CCC Hamburg Hackerspace located inside Frappant, as collective art space located in a historical monument in Hamburg, Germany.

The aim of the hackathon is to spent some days working on Reproducible Builds in every distribution and project. The event is open to anybody interested on working on Reproducible Builds issues, with or without prior experience!

Accomodation is available and travel sponsorship may be available by agreement. Please register your interest as soon as possible.

Reproducible Builds Summit Berlin 2016

This is just a quick note, that all the pads we've written during the Berlin summit in December 2016 are now online (thanks to Holger), nicely complementing the report by Aspiration Tech.

Request For Comments for new specification: BUILD_PATH_PREFIX_MAP

Ximin Luo posted a draft version of our BUILD_PATH_PREFIX_MAP specification for passing build-time paths between high-level and low-level build tools. This is meant to help eliminate irreproducibility caused by different paths being used at build time. At the time of writing, this affects an estimated 15-20% of 25000 Debian packages.

This is a continuation of an older proposal SOURCE_PREFIX_MAP, which has been updated based on feedback on our patches from GCC upstream, attendees of our Berlin 2016 summit, and participants on our mailing list. Thanks to everyone that contributed!

The specification also contains runnable source code examples and test cases; see our git repo.

Please comment on this draft ASAP - we plan to release version 1.0 of this in a few weeks.

Toolchain changes

  • #857632 apt: ignore the currently running kernel if attempting a reproducible build (Chris Lamb)
  • #857803 shadow: Make the sp_lstchg shadow field reproducible. (Chris Lamb)
  • #857892 fontconfig: please make the cache files reproducible (Chris Lamb)

Packages reviewed and fixed, and bugs filed

Chris Lamb:

Reviews of unreproducible packages

5 package reviews have been added, 274 have been updated and 800 have been removed in this week, adding to our knowledge about identified issues.

1 issue type has been added:

Weekly QA work

During our reproducibility testing, FTBFS bugs have been detected and reported by:

  • Chris Lamb (5)
  • Mattia Rizzolo (1)

diffoscope development

diffoscope 79 and 80 were uploaded to experimental by Chris Lamb. It included contributions from:

Chris Lamb:

  • Ensure that we really are using ImageMagick. (Closes: #857940)
  • Extract SquashFS images in one go rather than per-file, speeding up (eg.) Tails ISO comparison by ~10x.
  • Support newer versions of cbfstool to avoid test failures. (Closes: #856446)
  • Skip icc test that varies on endian if the Debian-specific patch is not present. (Closes: #856447)
  • Compare GIF images using gifbuild. (Closes: #857610)
  • Various other code quality, build and UI improvements.

Maria Glukhova:

  • Improve AndroidManifest.xml comparison for APK files. (Closes: #850758)

strip-nondeterminism development

strip-nondeterminism 0.032-1 was uploaded to unstable by Chris Lamb. It included contributions from:

Chris Lamb:

  • Fix a possible endless loop while stripping ar files due to trusting the file's file size data. Thanks to Tobias Stoeckmann for the report, patch and testcase. (Closes: #857975)
  • Add support for testing files we should reject.


This week's edition was written by Ximin Luo, Holger Levsen and Chris Lamb & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

Planet DebianTanguy Ortolo: Bad support of ZIP archives with extra fields

For sharing multiple files, it is often convenient to pack them into an archive, and the most widely supported format to do so is probably ZIP. Under *nix, you can archive a directory with Info-ZIP:

% zip -r something/

(When you have several files, it is recommended to archive them in a directory, to avoid cluttering the directory where people will extract them.)

Unsupported ZIP archive

Unfortunately, while we would expect ZIP files to be widely supported, I found out that this is not always the case, and I had many recipients failing to open them under operating systems such as iOS.

Avoid extra fields

That issue seems to be linked to the usage of extra file attributes, that are enabled by default, in order to store Unix file metadata. The field designed to store such extra attributes was designed from the beginning so each implementation can take into account attributes it supports and ignore any other ones, but some buggy ZIP implementation appear not to function at all with them.

Therefore, unless you actually need to preserve Unix file metadata, you should avoid using extra fields. With Info-ZIP, you would have to add the option -X:

% zip -rX something/

CryptogramNSA Documents from before 1930

Here is a listing of all the documents that the NSA has in its archives that are dated earlier than 1930.

Harald WelteOsmocom - personal thoughts

As I just wrote in my post about TelcoSecDay, I sometimes worry about the choices I made with Osmocom, particularly when I see all the great stuff people doing in fields that I previously was working in, such as applied IT security as well as Linux Kernel development.


When people like Dieter, Holger and I started to play with what later became OpenBSC, it was just for fun. A challenge to master. A closed world to break open and which to attack with the tools, the mindset and the values that we brought with us.

Later, Holger and I started to do freelance development for commercial users of Osmocom (initially basically only OpenBSC, but then OsmoSGSN, OsmoBSC, OsmoBTS, OsmoPCU and all the other bits on the infrastructure side). This lead to the creation of sysmocom in 2011, and ever since we are trying to use revenue from hardware sales as well as development contracts to subsidize and grow the Osmocom projects. We're investing most of our earnings directly into more staff that in turn works on Osmocom related projects.


It's important to draw the distinction betewen the Osmocom cellular infrastructure projects which are mostly driven by commercial users and sysmocom these days, and all the many other pure juts-for-fun community projects under the Osmocom umbrella, like OsmocomTETRA, OsmocomGMR, rtl-sdr, etc. I'm focussing only on the cellular infrastructure projects, as they are in the center of my life during the past 6+ years.

In order to do this, I basically gave up my previous career[s] in IT security and Linux kernel development (as well as put things like on hold). This is a big price to pay for crating more FOSS in the mobile communications world, and sometimes I'm a bit melancholic about the "old days" before.

Financial wealth is clearly not my primary motivation, but let me be honest: I could have easily earned a shitload of money continuing to do freelance Linux kernel development, IT security or related consulting. There's a lot of demand for related skills, particularly with some experience and reputation attached. But I decided against it, and worked several years without a salary (or almost none) on Osmocom related stuff [as did Holger].

But then, even with all the sacrifices made, and the amount of revenue we can direct from sysmocom into Osmocom development: The complexity of cellular infrastructure vs. the amount of funding and resources is always only a fraction of what one would normally want to have to do a proper implementation. So it's constant resource shortage, combined with lots of unpaid work on those areas that are on the immediate short-term feature list of customers, and that nobody else in the community feels like he wants to work on. And that can be a bit frustrating at times.

Is it worth it?

So after 7 years of OpenBSC, OsmocomBB and all the related projects, I'm sometimes asking myself whether it has been worth the effort, and whether it was the right choice.

It was right from the point that cellular technology is still an area that's obscure and unknown to many, and that has very little FOSS (though Improving!). At the same time, cellular networks are becoming more and more essential to many users and applications. So on an abstract level, I think that every step in the direction of FOSS for cellular is as urgently needed as before, and we have had quite some success in implementing many different protocols and network elements. Unfortunately, in most cases incompletely, as the amount of funding and/or resources were always extremely limited.


On the other hand, when it comes to metrics such as personal satisfaction or professional pride, I'm not very happy or satisfied. The community remains small, the commercial interest remains limited, and as opposed to the Linux world, most players have a complete lack of understanding that FOSS is not a one-way road, but that it is important for all stakeholders to contribute to the development in terms of development resources.

Project success?

I think a collaborative development project (which to me is what FOSS is about) is only then truly successful, if its success is not related to a single individual, a single small group of individuals or a single entity (company). And no matter how much I would like the above to be the case, it is not true for the Osmocom cellular infrastructure projects. Take away Holger and me, or take away sysmocom, and I think it would be pretty much dead. And I don't think I'm exaggerating here. This makes me sad, and after all these years, and after knowing quite a number of commercial players using our software, I would have hoped that the project rests on many more shoulders by now.

This is not to belittle the efforts of all the people contributing to it, whether the team of developers at sysmocom, whether those in the community that still work on it 'just for fun', or whether those commercial users that contract sysmocom for some of the work we do. Also, there are known and unknown donors/funders, like the NLnet foundation for some parts of the work. Thanks to all of you, and clearly we wouldn't be where we are now without all of that!

But I feel it's not sufficient for the overall scope, and it's not [yet] sustainable at this point. We need more support from all sides, particularly those not currently contributing. From vendors of BTSs and related equipment that use Osmocom components. From operators that use it. From individuals. From academia.

Yes, we're making progress. I'm happy about new developments like the Iu and Iuh support, the OsmoHLR/VLR split and 2G/3G authentication that Neels just blogged about. And there's progress on the SIMtrace2 firmware with card emulation and MITM, just as well as there's progress on libosmo-sigtran (with a more complete SUA, M3UA and connection-oriented SCCP stack), etc.

But there are too little people working on this, and those people are mostly coming from one particular corner, while most of the [commercial] users do not contribute the way you would expect them to contribute in collaborative FOSS projects. You can argue that most people in the Linux world also don't contribute, but then the large commercial beneficiaries (like the chipset and hardware makers) mostly do, as are the large commercial users.

All in all, I have the feeling that Osmocom is as important as it ever was, but it's not grown up yet to really walk on its own feet. It may be able to crawl, though ;)

So for now, don't panic. I'm not suffering from burn-out, mid-life crisis and I don't plan on any big changes of where I put my energy: It will continue to be Osmocom. But I also think we have to have a more open discussion with everyone on how to move beyond the current situation. There's no point in staying quiet about it, or to claim that everything is fine the way it is. We need more commitment. Not from the people already actively involved, but from those who are not [yet].

If that doesn't happen in the next let's say 1-2 years, I think it's fair that I might seriously re-consider in which field and in which way I'd like to dedicate my [I would think considerable] productive energy and focus.

Harald WelteReturning from TelcoSecDay 2017 / General Musings

I'm just on my way back from the Telecom Security Day 2017 <>, which is an invitation-only event about telecom security issues hosted by ERNW back-to-back with their Troopers 2017 <> conference.

I've been presenting at TelcoSecDay in previous years and hence was again invited to join (as attendee). The event has really gained quite some traction. Where early on you could find lots of IT security / hacker crowds, the number of participants from the operator (and to smaller extent also equipment maker) industry has been growing.

The quality of talks was great, and I enjoyed meeting various familiar faces. It's just a pity that it's only a single day - plus I had to head back to Berlin still today so I had to skip the dinner + social event.

When attending events like this, and seeing the interesting hacks that people are working on, it pains me a bit that I haven't really been doing much security work in recent years. netfilter/iptables was at least somewhat security related. My work on OpenPCD / librfid was clearly RFID security oriented, as was the work on airprobe, OsmocomTETRA, or even the EasyCard payment system hack

I have the same feeling when attending Linux kernel development related events. I have very fond memories of working in both fields, and it was a lot of fun. Also, to be honest, I believe that the work in Linux kernel land and the general IT security research was/is appreciated much more than the endless months and years I'm now spending my time with improving and extending the Osmocom cellular infrastructure stack.

Beyond the appreciation, it's also the fact that both the IT security and the Linux kernel communities are much larger. There are more people to learn from and learn with, to engage in discussions and ping-pong ideas. In Osmocom, the community is too small (and I have the feeling, it's actually shrinking), and in many areas it rather seems like I am the "ultimate resource" to ask, whether about 3GPP specs or about Osmocom code structure. What I'm missing is the feeling of being part of a bigger community. So in essence, my current role in the "Open Source Cellular" corner can be a very lonely one.

But hey, I don't want to sound more depressed than I am, this was supposed to be a post about TelcoSecDay. It just happens that attending IT Security and/or Linux Kernel events makes me somewhat gloomy for the above-mentioned reasons.

Meanwhile, if you have some interesting projcets/ideas at the border between cellular protocols/systems and security, I'd of course love to hear if there's some way to get my hands dirty in that area again :)

Google AdsenseAnnouncing the Winners of the Certified Publishing Partner 2016 Summer Challenge

Certified Publishing Partners are trained experts on AdSense, DoubleClick for Publishers, and DoubleClick Ad Exchange who can help publishers like you earn more while also saving you time. These Google partners can help you with a range of services from ads monetization to design and development support, allowing you to focus on creating quality content for your site.

The inaugural Certified Publishing Partner 2016 Summer Challenge launched in July 2016 to identify and recognize Certified Publishing Partners who show significant dedication and expertise in mobile, customer service, and innovation. Please join us in congratulating our three winning partners:

Customer Satisfaction Award Winner: AdThrive, USA

About the award: This award recognizes the partner who demonstrates outstanding overall quality of customer service for publishers.

Why AdThrive: AdThrive specializes in helping bloggers monetize their sites so that they can focus on blogging. According to our latest customer satisfaction survey, AdThrive scored 95/100. In addition to being able to deliver expected results for the publishers, AdThrive treats publishers in a way they are valued and has knowledgeable, friendly and approachable staff. This is a prime example of putting publisher’s needs first and delivering tangible results in an efficient and scalable way.

Congratulations to the AdThrive team!

Mobile Champion Award Winner: WOSO, China

About the award: This award recognizes the partner who demonstrates strong expertise in helping publishers capture mobile opportunities with strong user experiences and effective monetization. 

Why WOSO: We believe that Certified Publishing Partners should take the lead in helping publishers of all sizes in various markets successfully adapt to a mobile first environment. Our 2016 winner WOSO has done exactly that. They are a leading pioneer in China's online advertising and SEM, WOSO, and in Q3 2016 during the Summer Challenge, they helped publishers achieve a significant year over year growth in driving mobile web monetization. Way to go WOSO!

Business Innovation Award Winner: Ezoic, USA

About the award: This award recognizes innovations that drive real business impact and challenge the status quo. We continually encourage our partners to develop differentiating value-add services and solutions. 

Why Ezoic: We received more than 20 submissions for this award showing the breadth of innovation that our partners are bringing to publishers. Our 2016 winner, Ezoic, specializes in automated testing platform for content publishers. In 2016 they launched Ad Tester, a machine learning based solution designed to optimize user experience and ad earnings at the same time. 

We look forward to many more innovations from Ezoic in the future. 

Thank you to all of the Summer Challenge participants and congratulations to the winning partners!

Posted by Sean Meng,
Global Program Lead, Google’s Certified Publishing Partner Program

About Google Certified Publishing Partnerships:

A Certified Publishing Partner can help when you don’t want to do it alone. Our publishing partners handle everything from setting up to optimizing and maintaining ads, so you’re free to spend more time publishing content on your site. Using Google best practices, our publishing partners are adept at maximizing performance and earnings with AdSense, DoubleClick Ad Exchange, and DoubleClick for Publishers. For more information, visit our website. 

CryptogramWikiLeaks Not Disclosing CIA-Hoarded Vulnerabilities to Companies

WikiLeaks has started publishing a large collection of classified CIA documents, including information on several -- possibly many -- unpublished (i.e., zero-day) vulnerabilities in computing equipment used by Americans. Despite assurances that the US government prioritizes defense over offense, it seems that the CIA was hoarding vulnerabilities. (It's not just the CIA; last year we learned that the NSA is, too.)

Publishing those vulnerabilities into the public means that they'll get fixed, but it also means that they'll be used by criminals and other governments in the time period between when they're published and when they're patched. WikiLeaks has said that it's going to do the right thing and privately disclose those vulnerabilities to the companies first.

This process seems to be hitting some snags:

This week, Assange sent an email to Apple, Google, Microsoft and all the companies mentioned in the documents. But instead of reporting the bugs or exploits found in the leaked CIA documents it has in its possession, WikiLeaks made demands, according to multiple sources familiar with the matter who spoke on condition of anonymity.

WikiLeaks included a document in the email, requesting the companies to sign off on a series of conditions before being able to receive the actual technical details to deploy patches, according to sources. It's unclear what the conditions are, but a source mentioned a 90-day disclosure deadline, which would compel companies to commit to issuing a patch within three months.

I'm okay with a 90-day window; that seems reasonable. But I have no idea what the other conditions are, and how onerous they are.

Honestly, at this point the CIA should do the right thing and disclose all the vulnerabilities to the companies. They're burned as CIA attack tools. I have every confidence that Russia, China, and several other countries can hack WikiLeaks and get their hands on a copy. By now, their primary value is for defense. The CIA should bypass WikiLeaks and get the vulnerabilities fixed as soon as possible.

Worse Than FailureTales from the Interview: That Lying First Impression

Pickup truck with spoilers

Dima had just finished her Masters in electrical engineering, and was eagerly seeking out a job. She didn't feel any particular need to stick close to her alma mater, so she'd been applying to jobs all over the country.

When the phone rang during lunch hour, she was excited to learn it was a recruiter. After confirming he had the right person on the phone, he got right down to business: "We saw your resume this morning, and we're very impressed. We'd like you to come out for an on-site interview and tour. What's your availability next week?"

Dima agreed. It was only after she hung up that she realized he'd never given his name or company. Thankfully, he sent her an email within ten minutes with the information. It seemed he was representing DefCo, a major defense contractor with the US government. This would normally be worth a look; it was particularly interesting, however, because she'd only submitted her resume about an hour and a half prior.

They must be really impressed, she thought as she replied to confirm the travel arrangements. It'll be nice working someplace large that doesn't take forever to get things done.

A week later, Dima hopped out of the cab and made her way into the building. Wrinkle number one immediately presented itself: there were at least twenty other people standing around looking nervous and holding resumes.

I guess they interview in groups? she wondered. Well, they're clearly efficient.

As Dima waited to tour her first top-secret manufacturing plant, she made small talk with some of the other candidates, and hit wrinkle number two: they weren't all here for the same job. Several were business majors, others had only a high school diploma, while others were mathematicians and liberal arts majors.

Clearly they're consolidating the tour. Then we'll split up for interviews ...?

The tour guide, a reedy man with a nervous demeanor and a soft, timid voice, informed them that interviews would be conducted later in the day, after the tour. He walked them down the hallway.

Dima kept close to near the front so she could hear what he was saying. She needn't have bothered. As they passed the first closed door, he gestured to it and stammered out, "This might be a lab, I think? It could be one of the engineering labs, or perhaps one of the test facilities. They might even be writing software behind there. It's bound to be something exciting."

This went on for the better part of two hours. They passed locked door after locked door, with their guide only speculating on what might be inside as he fidgeted with his glasses and avoided eye contact. Finally, he declared, "And now, we'll tour the test facilities. Right this way to the warehouse, please. You're going to love this."

Wait, he didn't hedge his bets? We might actually see something today?! Dima knew better than to get her hopes up, but she couldn't help it. It wasn't as though they could get any lower.

They were let into the warehouse, and their guide took them straight toward one particular corner. As they crowded around what appeared to be an ordinary truck, their guide explained its significance in hushed, breath-taken tones: "This is the system upon which our new top-secret mobile Smart-SAM and cross-pulsed radar will be mounted. Soon, this will be one of the most advanced mobile platforms in the United States!"

And soon, it will be exciting, thought Dima in dismay. Right now, it's a truck.

"This concludes our tour," announced the guide, and it was all Dima could do not to groan. At least the interview is next. That can't be nearly as much of a let-down as the tour.

Dima was shown to a waiting area with the mathematician, while the others were spilt into their own separate areas. She was called back for her interview moments later. At least they're still punctual?

The interviewer introduced himself, and they shook hands. "Have you ever worked on a power supply, Dima?" he asked, which seemed like a logical question to begin the interview. She was just about to answer when he continued, "Just last week I was working on the supply for our cross-pulsed radar. That thing is huge, you wouldn't even believe it. Of course, it's not the biggest one I've ever built. Let's see now, that would've been back in '84 ..."

To her horror, he continued in this vein for fifteen minutes, discussing all the large power supplies he'd worked on. For the last five minutes of the interview he changed topics, discussing sound amplifiers you could run off those power supplies, and then which bands would make best use of them (Aerosmith? Metallica? Dima didn't care. She just kept nodding, no longer bothering to even smile). Finally, he thanked her for her time, and sent her on her way.

The next day, Dima was informed that she hadn't obtained the position. She breathed a sigh of relief and went on with her search.

[Advertisement] Manage IT infrastructure as code across all environments with Puppet. Puppet Enterprise now offers more control and insight, with role-based access control, activity logging and all-new Puppet Apps. Start your free trial today!

Sam VargheseThe AFR has lost its dictionary. And its style guide. And its subs

The Australian Financial Review claims to be one of the better newspapers in the country. But as is apparent from what follows, the paper lacks sub-editors who can spell or who have any knowledge of grammar.

Fairfax Media has an almighty big style guide, but the AFR seems to have thrown it out, along with any competent sub-editors.

All this is taken from a single article titled “Malcolm Turnbull wins support to water down race hate laws” on 21 March. Just imagine how many screw-ups there are in the entire paper. And the paper still complains it is losing readers. Guess why?


In “an” move? Surely that should be “in a move”?


“And the strengthen”? That “the” is dangling there like a limp dick in the breeze. Cut it off.


“Portrayed” is Mrs Malaprop at her brilliant best. The word is “betrayed”. And “ths” one takes it is “this” with the vowel dropped en route to the screen.


Pretense, not pretence. And yanked, not ranked.


Will? No, it should be would. Usage is always hypothetical and possible.


“The legislation” is singular. It cannot be later described as “they are”. The paragraph should read: “The legislation for the change will be introduced into the Senate first and has little prospect of passing because it is opposed by Labor, the Greens and NIck Xenophon.” And it’s Nick, not NIck.


Outbursts of anger. Not outburst. Plural as opposed to singular. Got it?


Not sure how Abetz is being described in the plural. Or did somebody include the obnoxious Cory Bernardi without naming him?


Shadow minister for citizenship and what??? And surely, one uses past tense in sentences like this – had not has?


Here, the word “to” seems to have gone AWOL.


I know Steve Ciobo is a dunce, but should one leave even his sentences dangling like this?


A comma in time saves nine. Just saying.

TEDA night to talk about design

TED NYC Design Lab

Designers solve problems and bring beauty to the world. At TEDNYC Design Lab, a night of talks at TED HQ in New York City hosted by design curator Chee Pearlman with content producer Cloe Shasha, six speakers pulled back the curtain to reveal the hard work and creative process behind great design. Speakers covered a range of topics, including the numbing monotony of modern cities (and how to break it), the power of a single image to tell a story and the challenge of building a sacred space in a secular age.

First up was Pulitzer-winning music and architecture critic Justin Davidson.

The touchable city. Shiny buildings are an invasive species, says Pulitzer-winning architecture critic Justin Davidson. In recent years, cities have become smooth, bright and reflective, as new downtowns sprout clusters of tall buildings that are almost always made of steel and glass. While glass can be beautiful (and easily transported, installed and replaced), the rejection of wood, sandstone, terra cotta, copper and marble as building materials has led to the simplification and impoverishment of the architecture in cities — as if we wanted to reduce all of the world’s cuisines to the blandness of airline food. “The need for shelter is bound up with the human desire for beauty,” Davidson says. “A city’s surfaces affect the way we live in it.” Buildings create the spaces around them; ravishing public places such as the Plaza Mayor in Salamanca, Spain, and the 17th-century Place des Vosges in Paris draw people in and make life look like an opera set, while glass towers push people away. Davidson warns of the dangers of this global trend: “When a city defaults to glass as it grows, it becomes a hall of mirrors: uneasy, disquiet and cold.” By offering a series of contemporary examples, Davidson call for “an urban architecture that honors the full range of urban experience.”

“The main thing we need right now is a good cartoon,” says Françoise Mouly. (Photo: Ryan Lash / TED)

The power of an image to capture a moment. The first cover of The New Yorker depicted a dandy looking at a butterfly through a monocle. Now referred to as “Eustace Tilley,” this iconic image was a tongue-in-cheek response to the stuffy aristocrats of the Jazz Age. When Françoise Mouly joined the magazine as art editor in 1993, she sought to restore the same spirit of humor to a magazine that had grown staid. In doing so, Mouly looked back into how The New Yorker covers reflected moments in history, finding that covers from the Great Depression revealed what made people laugh in times of hardship. For every anniversary edition of The New Yorker, a new version of the Eustace Tilley appears on the cover. This year, we see Vladimir Putin as the monocled Eustace Tilley peering at his butterfly, Donald Trump. For Mouly, “Free press is essential to our democracy. Artists can capture what is going on — with just ink and watercolor, they can capture and enter into a cultural dialogue, putting artists at the center of culture.”

Sinéad Burke

Sinéad Burke shared insights into a world that many designers don’t see, challenging the idea that design is only a tool to create function and beauty. “Design can inflict vulnerability on a group whose needs aren’t considered,” she says. (Photo: Ryan Lash / TED)

What is accessible design? “Design inhibits my independence and autonomy,” says educator and fashion blogger Sinéad Burke, who was born with achondroplasia (which translates as “without cartilage formation”) the most common form of dwarfism. At 105 centimeters (or 3 feet 5 inches) tall, Burke is acutely aware of details that are practically invisible to the rest of us — like the height of the lock in a bathroom stall or the range of available shoe sizes. So-called “accessible spaces” like bathrooms for people in wheelchairs are barely any better. In a stellar talk, Burke offers us a new perspective on the physical world we live in and asks us to consider the limits and biases of accessible design.

The beat of the Book Tree. Sofi Tukker brought the audience to their feet with hits “Hey Lion” and “Awoo,” featuring Betta Lemme. For the New York City–based duo, physical performance is a crucial element of their onstage presence, demonstrated through the use of a unique standing instrument they designed call “Book Tree,” made from actual books attached to a sampler — with each percussion comes a beat. Their debut album, Soft Animals, was released in July 2016, and their single “Drinkee” was nominated for Best Dance Recording at the 2017 Grammys.

Finding ourselves in dataGiorgia Lupi was 13 when Silvio Berlusconi shocked many in Italy by becoming prime minister in 1994. Why was that election result so surprising, she wondered? And as she learned, it’s because of incomplete data that had been gathered during the campaign. The available data was simply too limited and imprecise, too skewed to give any real picture of what was going on. In the aftermath of America’s 2016 election, where most data analysts predicted the wrong outcome, Lupi, the co-founder of data firm Accurat, suggests that such events highlight larger problems behind data’s representation. When we focus on creating powerful headlines and simple messages, we often lose the point completely, forgetting that data alone cannot represent reality; that beneath these numbers, human stories transform the abstract and the uncountable into something that can be seen, felt and directly reconnected to our lives and to our behaviors. What we need, she says, is data humanism. “To make data [sets] faithfully representative of our human nature, and to make sure they won’t mislead us anymore, we need to start designing new ways to include empathy, imperfection and human qualities in how we collect, process, analyze and display them.”

Siamak Hariri

Siamak Hariri describes his project, the Bahá’í Temple of South America in Santiago: “A prayer answered, open in all directions, capturing the blue light of dawn, the tent-like white light of day, the gold light of the afternoon, and at night, the reversal … catching the light in all kinds of mysterious ways.” (Photo: Ryan Lash / TED)

Can you design a sacred experience? Starting in 2006, architect Siamak Hariri attempted to do just that when he began his work on the Bahá’í Temple of South America in Santiago, Chile. He describes how he designed for a feeling that is at once essential and ineffable by focusing on illumination and creating a structure that captures the movement of light across the day. Hariri journeys from the quarries of Portugal, where his team found the precious stone to line the inside of the building like the silk of a jacket, to the temple’s splendid opening ceremony for an architectural experience unlike any other.

In the final talk of the night, Michael Bierut told a story of consequences, both intended and unintended. (Photo: Ryan Lash / TED)

Unintended consequences are often the best consequences. A few years ago, designer Michael Bierut was tapped by the Robin Hood Foundation to design a logo for a project to improve libraries in New York City public schools. Beruit is a legendary designer and critic — recent projects include rebranding the MIT Media Lab, reimagining the Saks Fifth Avenue logo and creating the logo for Hillary Clinton’s presidential campaign. So after some iterating, he came upon a simple idea: replacing the “i” in “library” with an exclamation point: L!BRARY, or The L!BRARY Initiative. His work on the project wasn’t over. One of the architects working on the libraries came to Bierut with a problem: the space between the library shelves, which had to be low to be accessible for kids, and the ceilings, which are often very high in the older school buildings, were calling out for design attention. After tapping his wife, a photographer, to fill in this space with a mural of beautiful portraits of schoolchildren, other schools took notice and wanted art of their own. Bierut brought in other illustrators, painters and artists to fill in the spaces with one-of-a-kind murals and art installations. As the new libraries opened, Bierut had a chance to visit them and the librarians who worked there, where he discovered the unintended consequences of his work. Far from designing only a logo, Bierut’s involvement in this project snowballed into a quest to bring energy, learning, art and graphics into these school libraries, where librarians dedicate themselves to excite new generations of readers and thinkers.

Rondam RamblingsCausality and Quantum Mechanics: a Cosmological Kalamity (Part 1 of 2)

I'm a little burned out on politics, so let's talk about religion instead. I often lurk on religious debate forums, and one of the things I've noticed over the years is that various arguments presented by Christian apologists seem to go in and out of fashion, not unlike bell bottoms and baggy pants.  At the moment, something called the Kalam Cosmological Argument (KCA) seems to be in vogue.  KCA

Rondam RamblingsCausality and Quantum Mechanics: a Cosmological Kalamity (Part 2 of 2)

This is the second in a two-part series of posts about the Kalam Cosmological Argument for the existence of God.  If you haven't read the first part you should probably do that first, notwithstanding that I'm going to start with a quick review. To recap: the KCA is based on the central premise that "whatever begins to exist has a cause."  But quantum mechanics provides us with at least two


Planet DebianMatthew Garrett: Buying a Utah teapot

The Utah teapot was one of the early 3D reference objects. It's canonically a Melitta but hasn't been part of their range in a long time, so I'd been watching Ebay in the hope of one turning up. Until last week, when I discovered that a company called Friesland had apparently bought a chunk of Melitta's range some years ago and sell the original teapot[1]. I've just ordered one, and am utterly unreasonably excited about this.

[1] They have them in 0.35, 0.85 and 1.4 litre sizes. I believe (based on the measurements here) that the 1.4 litre one matches the Utah teapot.

comment count unavailable comments

Planet DebianShirish Agarwal: Tale of two countries, India and Canada

Apologies – the first blog post got sent out by mistake.

Weather comparisons between the two countries

Last year, I had come to know that this year’s debconf is happening in Canada, a cold country. Hence, few weeks/month back, I started trying to find information online when I stumbled across few discussion boards where people were discussing about Innerwear and Outerwear and I couldn’t understand what that was all about. Then somehow stumbled across this Video, which is of a game called the Long Dark and just seeing couple of episodes it became pretty clear to me why the people there were obsessing with getting the right clothes and everything about it. Couple of Debconf people were talking about the weather in Montreal, and surprise, surprise it was snowing there, in fact supposed to be near the storm of the century. Was amazed to see that they have a website to track how much snow has been lifted.

If we compare that to Pune, India weather-wise we are polar opposites. There used to be a time, when I was very young, maybe 5 yrs. old that once the weather went above 30 degree celsius, rains would fall, but now its gonna touch 40 degrees soon. And April and May, the two hottest months are yet to come.

China Gate

Before I venture further, I was gifted the book ‘China Gate‘ written by an author named William Arnold. When I read the cover and the back cover, it seemed the story was set between China and Taiwan, later when I started reading it, it shares history of Taiwan going back 200 or so odd years. This became relevant as next year’s Debconf, Debconf 2018 will be in Taiwan, yes in Asia very much near to India. I am ashamed to say that except for the Tiananmen Square Massacre and the Chinese High-Speed Rail there wasn’t much that I knew. According to the book, and I’m paraphrasing here the gist I got was that for a long time, the Americans promised Taiwan it will be an Independent country forever, but due to budgetary and other political constraints, the United States took the same stand as China from 1979. Interestingly, now it seems Mr. Trump wants to again recognize Taiwan as a separate entity from China itself but as is with Mr. Trump you can’t be sure of why he does, what he does. Is it just a manoeuvrer designed to out-smart the chinese and have a trade war or something else, only time will tell.

One thing which hasn’t been shared in the book but came to know via web is that Taiwan calls itself ‘Republic of China’ . If Taiwan wants to be independent then why the name ‘Republic of China’ ? Doesn’t that strengthen China’s claim that Taiwan is an integral part of China. I don’t understand it.

The book does seduce you into thinking that the events are happening in real-time, as in happening now.

That’s enough OT for now.


Population Density

As well in the game and whatever I could find on the web, Canada seems to be on the lower side as far as population is concerned. IIRC, few years back, Canadians invited Indian farmers and gave them large land-holdings for over 100 years on some small pittance. While the link I have shared is from 2006, I read it online and in newspapers even as late as in 2013/2014. The point being there seems to be lot of open spaces in Canada, whereas in India we fight for even one inch literally, due to overpopulation. This sharing reminded me of ‘Mark of Gideon‘. While I was young, I didn’t understand the political meaning of it and still struggle to understand about whom the show was talking about. Was it India, Africa or some other continent they were talking about ?

This also becomes obvious when you figure out the surface area of the two countries. When I had started to learn about Canada, I had no idea, nor a clue that Canada is three times the size of India. And this is when I know India is a large country. but seeing that Canada is thrice larger just boggled my mind. As a typical urbanite, would probably become mad if in a rural area in Canada. Montreal, however seems to be something like Gwalior or Bangalore before IT stormed in, seems to be a place where people can work, play and have quite a few gardens as well.


This is one thing that is similar in both the great countries. India has Indian Railways and while the Canadians have their own mountain railway called viarail. India chugs on its 68k kilometre network, Canada is at fourth position with 52k network. With thrice the land size, it should have been somewhere where Russia is or even better than them. It would be interesting if a Canadian/s comment about their railway network and why it is so bad in terms of reach.

As far as food is concerned, somebody shared this

Also, have no idea if Canadian trains are as entertaining as Indian ones, in terms of diverse group of people as well as variety of food to eat as also shared a bit in the video. I am not aware whether Via Rail is the only network operator and there are other network operators unlike Indian Railways which has monopoly on most of the operations.

Countries which have first past the post system - Wikipedia

Business houses, Political Families

This is again something that is similar in both the countries, it seems (from afar) that its only few business houses and more importantly political families which have governed for years. From what little I could understand, both India and Canada have first past the post system which as shared by its critics is unfair to new and small parties. It would be interesting to see if Canada does a re-think. For India, it would need a massive public education outreach policy and implementation. We just had elections in 5 states of India with U.P. (with respect to area-size and population density) and from the last several years, the EVM’s (Electronic Voting Machines) tries to make sure that nobody could know which area which party got the most votes. This is to make sure the winning party is not able to take revenge on people or areas which did not vote for them. Instead you have general region counting of votes with probably even the Election Commission not knowing which EVM went to what area and what results are there in sort of double-blind methodology.

As far as Business houses are concerned, I am guessing it’s the same world-over, only certain people hold the wealth while majority of us are in hard-working, non-wealthy status.

Northern Lights - Aurora Borealis

Northern Lights, Aurora Borealis

Apart from all the social activities that Montreal is famous for, somebody told/shared with me that it is possible to see the Northern Lights, Aurora Borealis can be seen in Canada. I dunno how true or not it is, while probably in Montreal it isn’t possible to see due to light pollution, but maybe around 40-50 kms. from the city ? Can people see it from Canada ? IF yes, how far would you have to go ? Are there any companies or people who take people to see the Northern Lights.

While I still have to apply for bursary, and if that gets ok, then try getting the visa, but if that goes through, apart from debconf and social activities happening in and around Montreal, Museums, Music etc. , this would be something I would like to experience if it’s possible. While I certainly would have to be prepared for the cold that would be, if it’s possible, no offence to debconf or anybody else but it probably would be the highlight of the entire trip if its possible. This should be called/labelled as the greatest show on earth TM.

Filed under: Miscellenous Tagged: # Population Density, #Area size, #Aurora Borealis, #Canada, #Trains, DebConf, India, politics

CryptogramFriedman Comments on Yardley

This is William Friedman's highly annotated copy of Herbert Yardley's book, The American Black Chamber.

Planet DebianShirish Agarwal: Canada and India, similarities and differences.

Weather comparisons between the two countries

Few days/weeks back, I had come to know that Canada, where this year’s debconf is happening is cold country. I started trying to find information online when I stumbled across few boards where people were discussing about innerwear and outerwear and I couldn’t understand what that was all about. Then somehow stumbled across this game, it’s called the Long Dark and just seeing couple of episodes it became pretty clear to me why the people there were obsessing with getting the right clothes and everything about it. Couple of Debconf people were talking about weather in Montreal, and surprise, surprise it was snowing there, in fact supposed to be near the storm of the century. Was amazed to see that they have a website to track how much snow has been lifted.

If we compare that to Pune, India weather-wise we are polar opposites. There used to be a time, when I was very young, maybe 5 yrs. old that once the weather went above 30 degree celcius, rains would fall, but now its gonna touch 40 degrees soon. And April and May, the two hottest months are yet to come.

China Gate

Before I venture further, I was gifted the book ‘China Gate‘ written by an author named William Arnold. When I read the cover and the backcover, it seemed the story was set between China and Taiwan, later when I started reading it, it shares history of Taiwan going back 200 or so odd years. This became relevant as next year’s Debconf, Debconf 2018 will be in Taiwan, yes in Asia very much near to India. I am ashamed to say that except for the Tiananmen Square Massacre and the Chinese High-Speed Rail there wasn’t much that I knew. According to the book, and I’m paraphrasing here the gist I got was that for a long time, the Americans promised Taiwan it will be an Independent country forever, but due to budgetary and other political constraints, the United States took the same stand as China from 1979 and now it seems Mr. Trump wants to again recognize Taiwan as a separate entity from China itself.

One thing which hasn’t been shared in the book but came to know via web is that Taiwan calls itself ‘Republic of China’ . If Taiwan wants to be independent then why the name ‘Republic of China’ ? Doesn’t that strengthen China’s claim that Taiwan is an integral part of China. I don’t understand it.

The book does seduce you into thinking that the events are happening in real-time, as in happening now.

That’s enough OT for now.


Population Density

As well in the game and whatever I could find on the web, Canada seems to be on the lower side as far as population is concerned. IIRC, few years back, Canadians invited Indian farmers and gave them large land-holdings for over 100 years on some small pittance. While the link I have shared is from 2006, I read it online and in newspapers even as late as in 2013/2014. The point being there seems to be lot of open spaces in Canada, whereas in India we fight for even one inch literally, due to overpopulation. This sharing reminded me of ‘Mark of Gideon‘. While I was young, I didn’t understand the political meaning of it and still struggle to understand about whom the show was talking about. Was it India, Africa or some other continent they were talking about ?

This also becomes obvious when you figure out the surface area of the two countries. When I had started to learn about Canada, I had no idea, nor a clue that Canada is three times the size of India. And this is when I know India is a large country. but seeing that Canada is thrice larger just boggled my mind. As a typical urbanite, would probably become mad if in a rural area in Canada. Montreal, however seems to be something like Gwalior or Bangalore before IT stormed in, seems to be a place where people can work, play and have quite a few gardens as well.


This is one thing that is similar in both the great countries. India has Indian Railways and while the Canadians have their own mountain railway called viarail. India chugs on its 68k kilometre network, Canada is at fourth position with 52k network. With thrice the land size, it should have been somewhere where Russia is or even better than them. It would be interesting if Canadians comment about their railway network and why it is so bad in terms of reach.

Business houses, Political Families

This is again something that is similar in both the countries, it seems (from afar) that its only few business houses and more importantly political families which have governed for years.

Filed under: Miscellenous

Planet DebianBits from Debian: DebConf17 welcomes its first eighteen sponsors!

DebConf17 logo

DebConf17 will take place in Montreal, Canada in August 2017. We are working hard to provide fuel for hearts and minds, to make this conference once again a fertile soil for the Debian Project flourishing. Please join us and support this landmark in the Free Software calendar.

Eighteen companies have already committed to sponsor DebConf17! With a warm welcome, we'd like to introduce them to you.

Our first Platinum sponsor is Savoir-faire Linux, a Montreal-based Free/Open-Source Software company which offers Linux and Free Software integration solutions and actively contributes to many free software projects. "We believe that it's an essential piece [Debian], in a social and political way, to the freedom of users using modern technological systems", said Cyrille Béraud, president of Savoir-faire Linux.

Our first Gold sponsor is Valve, a company developing games, social entertainment platform, and game engine technologies. And our second Gold sponsor is Collabora, which offers a comprehensive range of services to help its clients to navigate the ever-evolving world of Open Source.

As Silver sponsors we have credativ (a service-oriented company focusing on open-source software and also a Debian development partner), Mojatatu Networks (a Canadian company developing Software Defined Networking (SDN) solutions), the Bern University of Applied Sciences (with over 6,600 students enrolled, located in the Swiss capital), Microsoft (an American multinational technology company), Evolix (an IT managed services and support company located in Montreal), Ubuntu (the OS supported by Canonical) and Roche (a major international pharmaceutical provider and research company dedicated to personalized healthcare).

ISG.EE, IBM, Bluemosh, Univention and Skroutz are our Bronze sponsors so far.

And finally, The Linux foundation, Réseau Koumbit and are our supporter sponsors.

Become a sponsor too!

Would you like to become a sponsor? Do you know of or work in a company or organization that may consider sponsorship?

Please have a look at our sponsorship brochure (or a summarized flyer), in which we outline all the details and describe the sponsor benefits.

For further details, feel free to contact us through, and visit the DebConf17 website at

Cory DoctorowRead: “Communist Party”: the first chapter of Walkaway

There’s still time to pre-order your signed first-edition hardcover of Walkaway, my novel which comes out on April 25 (US/UK), and while you’re waiting for that to ship, here’s chapter one of the novel, “Communist Party” (this is read by Wil Wheaton on the audiobook, where he is joined by such readers as Amanda Palmer and Amber Benson!).

1. Communist Party


Hubert Vernon Rudolph Clayton Irving Wilson Alva Anton Jeff Harley Timothy Curtis Cleveland Cecil Ollie Edmund Eli Wiley Marvin Ellis Espinoza was too old to be at a Communist party. At twenty-seven, he had seven years on the next oldest partier. He felt the demographic void. He wanted to hide behind one of the enormous filthy machines that dotted the floor of the derelict factory. Anything to escape the frank, flat looks from the beautiful children of every shade and size who couldn’t understand why an old man was creepering around.

“Let’s go,” he said to Seth, who’d dragged him to the party. Seth was terrified of aging out of the beautiful children demographic and entering the world of non-work. He had an instinct for finding the most outré, cutting edge, transgressive goings-on among the children who’d been receding in their rearview mirrors. Hubert, Etc, Espinoza only hung out with Seth because part of his thing about not letting go of his childhood was also not letting go of childhood friends. He was insistent on the subject, and Hubert, Etc was a pushover.

“This is about to get real,” Seth said. “Why don’t you get us beers?”

That was exactly what Hubert, Etc didn’t want to do. The beer was where the most insouciant adolescents congregated, merry and weird as tropical fishes. Each more elfin and tragic than the last. Hubert, Etc remembered that age, the certainty that the world was so broken that only an idiot would deign to acknowledge it or its inevitability. Hubert, Etc often confronted his reflection in his bathroom screen, stared into his eyes in their nest of bruisey bags, and remembered being someone who spent every minute denying the world’s legitimacy, and now he was enmeshed in it. Hubert, Etc couldn’t self-delude the knowledge away. Anyone under twenty would spot it in a second.

“Go on, man, come on. I got you into this party. Least you can do.”

Hubert, Etc didn’t say any obvious things about not wanting to come in the first place and not wanting beer in the second place. There were lots of pointless places an argument with Seth could go. He had his Peter Pan face on, prepared to be ha-ha-only-serious until you wore down, and Hubert, Etc started the night worn.

Walkaway: “Comunist Party”

[Cory Doctorow/]

Cory DoctorowHere’s the schedule for my 25-city US-Canada Walkaway tour!

There’s 25 stops in all on the US/Canada tour for WALKAWAY, my next novel, an “optimistic disaster novel” that comes out on April 25 (more stops coming soon, as well as publication of my UK tour).

I’ll be joined in various cities by many worthies, from Neal Stephenson (Seattle) to Ed Snowden (New York) to John Scalzi (LA, Santa Cruz, and San Francisco); Amber Benson (LA); Amie Stepanovich (DC); Joi Ito (Cambridge, MA); Max Temkin (Chicago); Brian David Johnson (Phoenix); Andy Baio (Portland, OR) — and more to come!

I hope to see you there, too!

Join Cory Doctorow on His Walkaway Tour, Starting April 25


Sociological ImagesRacial and Educational Segregation in the U.S.

Where you grow up is consequential. It plays a critical role in shaping who you are likely to become. Where you live affects your future earnings, how much education you’re likely to receive, how long you live, and much more.

Sociologists who study this are interested in the concentrated accumulations of specific types and qualities of capital (economic, cultural, social) found in abundance in certain locations, less in more, and virtually absent in some. And, as inequalities intersect with one another, marginalization tends to pile up. For instance, those areas of the U.S. that are disproportionately Black and Latino are also areas struggling economically (see Dustin A. Cable’s racial dot map of the U.S.). Similarly, those areas of the country with the least upward mobility are also areas with some of the highest proportions of households of people of color. And, perhaps not shockingly (although it should be), schools in these areas receive fewer resources and have lower outcomes for students.

How much education you receive is, in part, a result of where you grow up. Think about it: you’re be more likely to end up with at least a bachelor’s degree if you grow up in an area where almost everyone is at least college educated. It’s not a requirement, but it’s more likely. And, if you do and go on to live in a similar community and have children, your kids will benefit from you carrying on that cycle as well. Of course, this system of advantages works in reverse for communities with lower levels of educational attainment.

Recently, a geography professor, Kyle Walker, mapped educational attainment in the U.S. Inspired by Cable’s map of racial segregation, Walker visualizes educational inequality in the U.S. from a bird’s eye view. And when we compare Walker’s map of educational attainment to Cable’s map of racial segregation, you can see how inequalities tend to accumulate.

Below, I’ve displayed paired images of a selection of U.S. cities using both maps. In each image, the top map illustrates educational attainment and the bottom visualizes race.

  • On Walker’s map of educational attainment (top images in each pair), the colors indicate: less than high schoolhigh schoolsome collegebachelor’s degree, and graduate degree.
  • On Cable’s map of racial segregation (bottom images in each pair), the colors indicate: White, Black, Hispanic, Asian, and Other Race/Native American/Multi-Racial

So, one way of comparing the images below is to look at how the blue areas compare on each map of the same region.  

Below, you can see San Francisco, Berkeley, and San Jose, California in the same frame using Walker’s map of educational attainment (top) over Cable’s racial dot map (bottom).See how people are segregated by educational attainment (top image) and race (bottom image) in Chicago, Illinois:
Los Angeles, California:
New York City:
Detroit, Michigan:
Houston, Texas:
Compare regions of the U.S. examining Walker’s map with Cable’s racial dot map, you can see how racial and educational inequality intersect. While I only visualized cities above for comparison on both maps, if you examine Walker’s map of educational attainment, two broad trends with respect to segregation by educational attainment are easily visible:

  • Urban/rural divide–people with bachelors and graduate degrees tend to be clustered in cities and metropolitan areas.
  • Racial and economic inequalities–within metropolitan areas, you can see educational achievement segregation that both reflects and reinforces racial and economic segregation within the area (this is what you see above).

And, as research has shown, the levels of parents’ educational attainment within an area impacts the educational performances of the children living in that area as well. That’s how social reproduction happens. Sociologists are interested in how inequalities are passed on to subsequent generations. And it is sometimes hard to notice in your daily life because, as you can see above, we’re segregated from one another (by race, education, class, and more). And this segregation is one way interlocking inequalities persist.

Tristan Bridges, PhD is a professor at The College at Brockport, SUNY. He is the co-editor of Exploring Masculinities: Identity, Inequality, Inequality, and Change with C.J. Pascoe and studies gender and sexual identity and inequality. You can follow him on Twitter here. Tristan also blogs regularly at Inequality by (Interior) Design.

(View original at

Worse Than FailureCodeSOD: Countup Timer

Dan has inherited a pile of Objective-C. That’s not the WTF. The previous developer had some… creative problem solving techniques.

For example, he needed to show a splash screen, and after three seconds, make it vanish. You might be thinking to yourself, “So I set a timer for 3000 milliseconds, and then close the splash screen, right?”

- (void)viewDidLoad {
    [super viewDidLoad];
    timerSplashScreen = [NSTimer scheduledTimerWithTimeInterval:1 target:self selector:@selector(StartLoading) userInfo:nil repeats:YES];

-(void)StartLoading {
        [timerSplashScreen invalidate];
        // Close the splash screen

Of course not! You set a timer for 1 second, and then count how many times the timer has fired. When the count hits 3, you can close the splash screen. Oh, for bonus points, increment the count after checking the count, so that way you have a lovely off-by-one bug that means the splash screen stays up for 4 seconds, not 3.

[Advertisement] Onsite, remote, bare-metal or cloud – create, configure and orchestrate 1,000s of servers, all from the same dashboard while continually monitoring for drift and allowing for instantaneous remediation. Download Otter today!

CryptogramSecurity Vulnerabilities in Mobile MAC Randomization

Interesting research: "A Study of MAC Address Randomization in Mobile Devices When it Fails":

Abstract: Media Access Control (MAC) address randomization is a privacy technique whereby mobile devices rotate through random hardware addresses in order to prevent observers from singling out their traffic or physical location from other nearby devices. Adoption of this technology, however, has been sporadic and varied across device manufacturers. In this paper, we present the first wide-scale study of MAC address randomization in the wild, including a detailed breakdown of different randomization techniques by operating system, manufacturer, and model of device. We then identify multiple flaws in these implementations which can be exploited to defeat randomization as performed by existing devices. First, we show that devices commonly make improper use of randomization by sending wireless frames with the true, global address when they should be using a randomized address. We move on to extend the passive identification techniques of Vanhoef et al. to effectively defeat randomization in 96% of Android phones. Finally, we show a method that can be used to track 100% of devices using randomization, regardless of manufacturer, by exploiting a previously unknown flaw in the way existing wireless chipsets handle low-level control frames.

Basically, iOS and Android phones are not very good at randomizing their MAC addresses. And tricks with level-2 control frames can exploit weaknesses in their chipsets.

Slashdot post.

Planet Linux AustraliaBinh Nguyen: Life in Brazil, Random Stuff, and More

- history seems reminiscent of other Latin American nations. Mix of European colonisation and local tribes. Was obviously used for it's natural resources but it's clear that it's economy has diversified since then. That said, clear issues with corruption and wealth inequality brazil history Brazil export treemap by product (2014) from Harvard Atlas of Economic Complexity When the Portuguese

Planet Linux AustraliaBinh Nguyen: Pre-Cogs and Prophets 10, Random Stuff, and More

- a clear continuation of my other posts on pre-cogs/prophets:

Planet Linux AustraliaOpenSTEM: This Week in HASS – term 1, week 8

As we move into the final weeks of term, and the Easter holiday draws closer, our youngest students are looking at different kinds of celebrations in Australia. Students in years 1 to 3 are looking at their global family and students in years 3 to 6 are chasing Aunt Madge around the world, being introduced to Eratosthenes and examining Shadows and Light.

Foundation to Year 3

Our standalone Foundation/Prep students (Unit F.1) are studying celebrations in Australia and thinking about which is their favourite. It may well be Easter with its bunnies and chocolate eggs, which lies just around the corner now! They also get a chance to consider whether we should add any extra celebrations into our calendar in Australia. Those Foundation/Prep students in an integrated class with Year 1 students (Unit F.5), as well as Year 1 (Unit 1.1), 2 (Unit 2.1) and 3 (Unit 3.1) students are investigating where they, and other family members, were born and finding these places on the world map. Students are also examining features of the world map – including the different continents, North and South Poles, the equator and the oceans. Students also get a chance to undertake the Aunt Madge’s Suitcase Activity, in which they follow Aunt Madge around the world, learning about different countries and landmarks, as they go. Aunt Madge’s Suitcase is extremely popular with students of all ages – as it can easily be adapted to cover material at different depths. The activity encourages students to interact with the world map, whilst learning to recognise major natural and cultural landmarks in Australia and around the world.

Years 3 to 6

Aunt Madge

Students in Year 3 (Unit 3.5), who are integrated with Year 4, as well as the Year 4 (Unit 4.1), 5 (Unit 5.1) and 6 (Unit 6.1) students, have moved on to a new set of activities this week. The older students approach the Aunt Madge’s Suitcase Activity in more depth, deriving what items Aunt Madge has packed in her suitcase to match the different climates which she is visiting, as well as delving into each landmark visited in more detail. These landmarks are both natural and cultural and, although several are in Australia, examples are given from around the world, allowing teachers to choose their particular focus each time the activity is undertaken. As well as following Aunt Madge, students are introduced to Eratosthenes. Known as the ‘Father of Geography’, Eratosthenes also calculated the circumference of the Earth. There is an option for teachers to overlap with parts of the Maths curriculum here. Eratosthenes also studied the planets and used shadows and sunlight for his calculations, which provides the link for the Science activities – Shadows and Light, Sundials and Planets of the Solar System.

Next week is the last week of our first term units. By now students have completed the bulk of their work for the term, and teachers are able to assess most of the HASS areas already.



Planet Linux Australiasthbrx - a POWER technical blog: Erasure Coding for Programmers, Part 1

Erasure coding is an increasingly popular storage technology - allowing the same level of fault tolerance as replication with a significantly reduced storage footprint.

Increasingly, erasure coding is available 'out of the box' on storage solutions such as Ceph and OpenStack Swift. Normally, you'd just pull in a library like ISA-L or jerasure, and set some config options, and you'd be done.

This post is not about that. This post is about how I went from knowing nothing about erasure coding to writing POWER optimised routines to make it go fast. (These are in the process of being polished for upstream at the moment.) If you want to understand how erasure coding works under the hood - and in particular if you're interested in writing optimised routines to make it run quickly in your platform - this is for you.

What are erasure codes anyway?

I think the easiest way to begin thinking about erasure codes is "RAID 6 on steroids". RAID 6 allows you to have up to 255 data disks and 2 parity disks (called P and Q), thus allowing you to tolerate the failure of up to 2 arbitrary disks without data loss.

Erasure codes allow you to have k data disks and m 'parity' or coding disks. You then have a total of m + k disks, and you can tolerate the failure of up to m without losing data.

The downside of erasure coding is that computing what to put on those parity disks is CPU intensive. Lets look at what we put on them.


RAID 6 is the easiest way to get started on understanding erasure codes for a number of reasons. H Peter Anvin's paper on RAID 6 in the Linux kernel is an excellent start, but does dive in a bit quickly to the underlying mathematics. So before reading that, read on!

Rings and Fields

As programmers we're pretty comfortable with modular arithmetic - the idea that if you have:

unsigned char a = 255;

the new value of a will be 0, not 256.

This is an example of an algebraic structure called a ring.

Rings obey certain laws. For our purposes, we'll consider the following incomplete and somewhat simplified list:

  • There is an addition operation.
  • There is an additive identity (normally called 0), such that 'a + 0 = a'.
  • Every element has an additive inverse, that is, for every element 'a', there is an element -a such that 'a + (-a) = 0'
  • There is a multiplication operation.
  • There is a multiplicative identity (normally called 1), such that 'a * 1 = a'.

These operations aren't necessarily addition or multiplication as we might expect from the integers or real numbers. For example, in our modular arithmetic example, we have 'wrap around'. (There are also certain rules the addition and multiplication rules must satisfy - we are glossing over them here.)

One thing a ring doesn't have a 'multiplicative inverse'. The multiplicative inverse of some non-zero element of the ring (call it a), is the value b such that a * b = 1. (Often instead of b we write 'a^-1', but that looks bad in plain text, so we shall stick to b for now.)

We do have some inverses in 'mod 256': the inverse of 3 is 171 as 3 * 171 = 513, and 513 = 1 mod 256, but there is no b such that 2 * b = 1 mod 256.

If every non-zero element of our ring had a multiplicative inverse, we would have what is called a field.

Now, let's look at a the integers modulo 2, that is, 0 and 1.

We have this for addition:

+ 0 1
0 0 1
1 1 0

Eagle-eyed readers will notice that this is the same as XOR.

For multiplication:

* 0 1
0 0 0
1 0 1

As we said, a field is a ring where every non-zero element has a multiplicative inverse. As we can see, the integers modulo 2 shown above is a field: it's a ring, and 1 is its own multiplicative inverse.

So this is all well and good, but you can't really do very much in a field with 2 elements. This is sad, so we make bigger fields. For this application, we consider the Galois Field with 256 elements - GF(2^8). This field has some surprising and useful properties.

Remember how we said that integers modulo 256 weren't a field because they didn't have multiplicative inverses? I also just said that GF(2^8) also has 256 elements, but is a field - i.e., it does have inverses! How does that work?

Consider an element in GF(2^8). There are 2 ways to look at an element in GF(2^8). The first is to consider it as an 8-bit number. So, for example, let's take 100. We can express that as as an 8 bit binary number: 0b01100100.

We can write that more explicitly as a sum of powers of 2:

0 * 2^7 + 1 * 2^6 + 1 * 2^5 + 0 * 2^4 + 0 * 2^3 + 1 * 2^2 + 0 * 2 + 0 * 1
= 2^6 + 2^5 + 2^2

Now the other way we can look at elements in GF(2^8) is to replace the '2's with 'x's, and consider them as polynomials. Each of our bits then represents the coefficient of a term of a polynomial, that is:

0 x^7 + 1 x^6 + 1 x^5 + 0 x^4 + 0 x^3 + 1 x^2 + 0 x + 0 * 1

or more simply

x^6 + x^5 + x^2

Now, and this is important: each of the coefficients are elements of the integers modulo 2: x + x = 2x = 0 as 2 mod 2 = 0. There is no concept of 'carrying' in this addition.

Let's try: what's 100 + 79 in GF(2^8)?

100 = 0b01100100 => x^6 + x^5 +       x^2
 79 = 0b01001111 => x^6 +       x^3 + x^2 + x + 1

100 + 79         =>   0 + x^5 + x^3 +   0 + x + 1
                 =    0b00101011 = 43

So, 100 + 79 = 43 in GF(2^8)

You may notice we could have done that much more efficiently: we can add numbers in GF(2^8) by just XORing their binary representations together. Subtraction, amusingly, is the same as addition: 0 + x = x = 0 - x, as -1 is congruent to 1 modulo 2.

So at this point you might be wanting to explore a few additions yourself. Fortuantely there's a lovely tool that will allow you to do that:

sudo apt install gf-complete-tools
gf_add $A $B 8

This will give you A + B in GF(2^8).

> gf_add 100 79 8


So, hold on to your hats, as this is where things get really weird. In modular arithmetic example, we considered the elements of our ring to be numbers, and we performed our addition and multiplication modulo 256. In GF(2^8), we consider our elements as polynomials and we perform our addition and multiplication modulo a polynomial. There is one conventional polynomial used in applications:

0x11d => 0b1 0001 1101 => x^8 + x^4 + x^3 + x^2 + 1

It is possible to use other polynomials if they satisfy particular requirements, but for our applications we don't need to worry as we will always use 0x11d. I am not going to attempt to explain anything about this polynomial - take it as an article of faith.

So when we multiply two numbers, we multiply their polynomial representations. Then, to find out what that is modulo 0x11d, we do polynomial long division by 0x11d, and take the remainder.

Some examples will help.

Let's multiply 100 by 3.

100 = 0b01100100 => x^6 + x^5 + x^2
  3 = 0b00000011 => x + 1

(x^6 + x^5 + x^2)(x + 1) = x^7 + x^6 + x^3 + x^6 + x^5 + x^2
                         = x^7 + x^5 + x^3 + x^2

Notice that some of the terms have disappeared: x^6 + x^6 = 0.

The degree (the largest power of a term) is 7. 7 is less than the degree of 0x11d, which is 8, so we don't need to do anything: the remainder modulo 0x11d is simply x^7 + x^5 + x^3 + x^2.

In binary form, that is 0b10101100 = 172, so 100 * 3 = 172 in GF(2^8).

Fortunately gf-complete-tools also allows us to check multiplications:

> gf_mult 100 3 8


Now let's see what happens if we multiply by a larger number. Let's multiply 100 by 5.

100 = 0b01100100 => x^6 + x^5 + x^2
  5 = 0b00000101 => x^2 + 1

(x^6 + x^5 + x^2)(x^2 + 1) = x^8 + x^7 + x^4 + x^6 + x^5 + x^2
                           = x^8 + x^7 + x^6 + x^5 + x^4 + x^2

Here we have an x^8 term, so we have a degree of 8. This means will get a different remainder when we divide by our polynomial. We do this with polynomial long division, which you will hopefully remember if you did some solid algebra in high school.

x^8 + x^4 + x^3 + x^2 + 1 | x^8 + x^7 + x^6 + x^5 + x^4       + x^2
                          - x^8                   + x^4 + x^3 + x^2 + 1
                          =       x^7 + x^6 + x^5       + x^3       + 1

So we have that our original polynomial (x^8 + x^4 + x^3 + x^2 + 1) is congruent to (x^7 + x^6 + x^5 + x^3 + 1) modulo the polynomial 0x11d. Looking at the binary representation of that new polynomial, we have 0b11101001 = 233.

Sure enough:

> gf_mult 100 5 8

Just to solidify the polynomial long division a bit, let's try a slightly larger example, 100 * 9:

100 = 0b01100100 => x^6 + x^5 + x^2
  9 = 0b00001001 => x^3 + 1

(x^6 + x^5 + x^2)(x^3 + 1) = x^9 + x^8 + x^5 + x^6 + x^5 + x^2
                           = x^9 + x^8 + x^6 + x^2

Doing long division to reduce our result:

x^8 + x^4 + x^3 + x^2 + 1 | x^9 + x^8       + x^6                   + x^2
                          - x^9                   + x^5 + x^4 + x^3       + x
                          =       x^8       + x^6 + x^5 + x^4 + x^3 + x^2 + x

We still have a polynomial of degree 8, so we can do another step:

                              x +   1
x^8 + x^4 + x^3 + x^2 + 1 | x^9 + x^8       + x^6                   + x^2
                          - x^9                   + x^5 + x^4 + x^3       + x
                          =       x^8       + x^6 + x^5 + x^4 + x^3 + x^2 + x
                          -       x^8                   + x^4 + x^3 + x^2     + 1
                          =                   x^6 + x^5                   + x + 1

We now have a polynomial of degree less than 8 that is congruent to our original polynomial modulo 0x11d, and the binary form is 0x01100011 = 99.

> gf_mult 100 9 8

This process can be done more efficiently, of course - but understanding what is going on will make you much more comfortable with what is going on!

I will not try to convince you that all multiplicative inverses exist in this magic shadow land of GF(2^8), but it's important for the rest of the algorithms to work that they do exist. Trust me on this.

Back to RAID 6

Equipped with this knowledge, you are ready to take on RAID6 in the kernel (PDF) sections 1 - 2.

Pause when you get to section 3 - this snippet is a bit magic and benefits from some explanation:

Multiplication by {02} for a single byte can be implemeted using the C code:

uint8_t c, cc;
cc = (c << 1) ^ ((c & 0x80) ? 0x1d : 0);

How does this work? Well:

Say you have a binary number 0bNMMM MMMM. Mutiplication by 2 gives you 0bNMMMMMMM0, which is 9 bits. Now, there are two cases to consider.

If your leading bit (N) is 0, your product doesn't have an x^8 term, so we don't need to reduce it modulo the irreducible polynomial.

If your leading bit is 1 however, your product is x^8 + something, which does need to be reduced. Fortunately, because we took an 8 bit number and multiplied it by 2, the largest term is x^8, so we only need to reduce it once. So we xor our number with our polynomial to subtract it.

We implement this by letting the top bit overflow out and then xoring the lower 8 bits with the low 8 bits of the polynomial (0x1d)

So, back to the original statement:

(c << 1) ^ ((c & 0x80) ? 0x1d : 0)
    |          |          |     |
    > multiply by 2       |     |
               |          |     |
               > is the high bit set - will the product have an x^8 term?
                          |     |
                          > if so, reduce by the polynomial
                                > otherwise, leave alone

Hopefully that makes sense.

Key points

It's critical you understand the section on Altivec (the vperm stuff), so let's cover it in a bit more detail.

Say you want to do A * V, where A is a constant and V is an 8-bit variable. We can express V as V_a + V_b, where V_a is the top 4 bits of V, and V_b is the bottom 4 bits. A * V = A * V_a + A * V_b

We can then make lookup tables for multiplication by A.

If we did this in the most obvious way, we would need a 256 entry lookup table. But by splitting things into the top and bottom halves, we can reduce that to two 16 entry tables. For example, say A = 02.

V_a A * V_a
00 00
01 02
02 04
... ...
0f 1e
V_b A * V_b
00 00
10 20
20 40
... ...
f0 fd

We then use vperm to look up entries in these tables and vxor to combine our results.

So - and this is a key point - for each A value we wish to multiply by, we need to generate a new lookup table.

So if we wanted A = 03:

V_a A * V_a
00 00
01 03
02 06
... ...
0f 11
V_b A * V_b
00 00
10 30
20 60
... ...
f0 0d

One final thing is that Power8 adds a vpermxor instruction, so we can reduce the entire 4 instruction sequence in the paper:

vsrb v1, v0, v14
vperm v2, v12, v12, v0
vperm v1, v13, v13, v1
vxor v1, v2, v1

to 1 vpermxor:

vpermxor v1, v12, v13, v0

Isn't POWER grand?

OK, but how does this relate to erasure codes?

I'm glad you asked.

Galois Field arithmetic, and its application in RAID 6 is the basis for erasure coding. (It's also the basis for CRCs - two for the price of one!)

But, that's all to come in part 2, which will definitely be published before 7 April!

Many thanks to Sarah Axtens who reviewed the mathematical content of this post and suggested significant improvements. All errors and gross oversimplifications remain my own. Thanks also to the OzLabs crew for their feedback and comments.

Planet DebianDirk Eddelbuettel: Rcpp 0.12.10: Some small fixes

The tenth update in the 0.12.* series of Rcpp just made it to the main CRAN repository providing GNU R with by now over 10,000 packages. Windows binaries for Rcpp, as well as updated Debian packages will follow in due course. This 0.12.10 release follows the 0.12.0 release from late July, the 0.12.1 release in September, the 0.12.2 release in November, the 0.12.3 release in January, the 0.12.4 release in March, the 0.12.5 release in May, the 0.12.6 release in July, the 0.12.7 release in September, the 0.12.8 release in November, and the 0.12.9 release in January --- making it the fourteenth release at the steady and predictable bi-montly release frequency.

Rcpp has become the most popular way of enhancing GNU R with C or C++ code. As of today, 975 packages on CRAN depend on Rcpp for making analytical code go faster and further. That is up by sixtynine packages over the two months since the last release -- or just over a package a day!

The changes in this release are almost exclusively minor bugfixes and enhancements to documentation and features: James "coatless" Balamuta rounded out the API, Iñaki Ucar fixed a bug concerning one-character output, Jeroen Ooms allowed for finalizers on XPtr objects, Nathan Russell corrected handling of lower (upper) triangular matrices, Dan Dillon and I dealt with Intel compiler quirks for his algorithm.h header, and I added a C++17 plugin along with some (overdue!) documentation regarding the various C++ standards that are supported by Rcpp (which is in essence whatever your compiler supports, i.e., C++98, C++11, C++14 all the way to C++17 but always keep in mind what CRAN and different users may deploy).

Changes in Rcpp version 0.12.10 (2017-03-17)

  • Changes in Rcpp API:

    • Added new size attribute aliases for number of rows and columns in DataFrame (James Balamuta in #638 addressing #630).

    • Fixed single-character handling in Rstreambuf (Iñaki Ucar in #649 addressing #647).

    • XPtr gains a parameter finalizeOnExit to enable running the finalizer when R quits (Jeroen Ooms in #656 addressing #655).

  • Changes in Rcpp Sugar:

    • Fixed sugar functions upper_tri() and lower_tri() (Nathan Russell in #642 addressing #641).

    • The algorithm.h file now accomodates the Intel compiler (Dirk in #643 and Dan in #645 addressing issue #640).

  • Changes in Rcpp Attributes

    • The C++17 standard is supported with a new plugin (used eg for g++-6.2).
  • Changes in Rcpp Documentation:

    • An overdue explanation of how C++11, C++14, and C++17 can be used was added to the Rcpp FAQ.

Thanks to CRANberries, you can also look at a diff to the previous release. As always, even fuller details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads page, the browseable doxygen docs and zip files of doxygen output for the standard formats. A local directory has source and documentation too. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Planet DebianPetter Reinholdtsen: Free software archive system Nikita now able to store documents

The Nikita Noark 5 core project is implementing the Norwegian standard for keeping an electronic archive of government documents. The Noark 5 standard document the requirement for data systems used by the archives in the Norwegian government, and the Noark 5 web interface specification document a REST web service for storing, searching and retrieving documents and metadata in such archive. I've been involved in the project since a few weeks before Christmas, when the Norwegian Unix User Group announced it supported the project. I believe this is an important project, and hope it can make it possible for the government archives in the future to use free software to keep the archives we citizens depend on. But as I do not hold such archive myself, personally my first use case is to store and analyse public mail journal metadata published from the government. I find it useful to have a clear use case in mind when developing, to make sure the system scratches one of my itches.

If you would like to help make sure there is a free software alternatives for the archives, please join our IRC channel (#nikita on and the project mailing list.

When I got involved, the web service could store metadata about documents. But a few weeks ago, a new milestone was reached when it became possible to store full text documents too. Yesterday, I completed an implementation of a command line tool archive-pdf to upload a PDF file to the archive using this API. The tool is very simple at the moment, and find existing fonds, series and files while asking the user to select which one to use if more than one exist. Once a file is identified, the PDF is associated with the file and uploaded, using the title extracted from the PDF itself. The process is fairly similar to visiting the archive, opening a cabinet, locating a file and storing a piece of paper in the archive. Here is a test run directly after populating the database with test data using our API tester:

~/src//noark5-tester$ ./archive-pdf mangelmelding/mangler.pdf
using arkiv: Title of the test fonds created 2017-03-18T23:49:32.103446
using arkivdel: Title of the test series created 2017-03-18T23:49:32.103446

 0 - Title of the test case file created 2017-03-18T23:49:32.103446
 1 - Title of the test file created 2017-03-18T23:49:32.103446
Select which mappe you want (or search term): 0
Uploading mangelmelding/mangler.pdf
  PDF title: Mangler i spesifikasjonsdokumentet for NOARK 5 Tjenestegrensesnitt
  File 2017/1: Title of the test case file created 2017-03-18T23:49:32.103446

You can see here how the fonds (arkiv) and serie (arkivdel) only had one option, while the user need to choose which file (mappe) to use among the two created by the API tester. The archive-pdf tool can be found in the git repository for the API tester.

In the project, I have been mostly working on the API tester so far, while getting to know the code base. The API tester currently use the HATEOAS links to traverse the entire exposed service API and verify that the exposed operations and objects match the specification, as well as trying to create objects holding metadata and uploading a simple XML file to store. The tester has proved very useful for finding flaws in our implementation, as well as flaws in the reference site and the specification.

The test document I uploaded is a summary of all the specification defects we have collected so far while implementing the web service. There are several unclear and conflicting parts of the specification, and we have started writing down the questions we get from implementing it. We use a format inspired by how The Austin Group collect defect reports for the POSIX standard with their instructions for the MANTIS defect tracker system, in lack of an official way to structure defect reports for Noark 5 (our first submitted defect report was a request for a procedure for submitting defect reports :).

The Nikita project is implemented using Java and Spring, and is fairly easy to get up and running using Docker containers for those that want to test the current code base. The API tester is implemented in Python.

Planet DebianClint Adams: Measure once, devein twice

Ophira lived in a wee house in University Square, Tampa. It had one floor, three bedrooms, two baths, a handful of family members, a couple pets, some plants, and an occasional staring contest.

Mauricio lived in Lowry Park North, but Ophira wasn’t allowed to go there because Mauricio was afraid that someone would tell his girlfriend. Ophira didn’t like Mauricio’s girlfriend and Mauricio’s girlfriend did not like Ophira.

Mauricio did not bring his girlfriend along when he and Ophira went to St. Pete Beach. They frolicked in the ocean water, and attempted to have sex. Mauricio and Ophira were big fans of science, so Somewhat quickly they concluded that it is impossible to have sex underwater, and absconded to Ophira’s car to have sex therein.

“I hate Mauricio’s girlfriend,” Ophira told Amit on the telephone. “She’s not even pretty.”

“Hey, listen,” said Amit. “I’m going to a wedding on Captiva.”

“Oh, my family used to go to Captiva every year. There’s bioluminescent algae and little crabs and stuff.”

“Yeah? Do you want to come along? You could pick me up at the airport.”

“Why would I want to go to a wedding?”

“Well, it’s on the beach and they’re going to have a bouncy castle.”

“A bouncy castle‽ Are you serious?”


“Well, okay.”

Amit prepared to go to the wedding and Ophira became terse then unresponsive. After he landed at RSW, he called Ophira, but instead of answering the phone she startled and fell out of her chair. Amit arranged for other transportation toward the Sanibel Causeway. Ophira bit her nails for a few hours, then went to her car and drove to Cape Coral.

Ophira cruised around Cape Coral for a while, until she spotted a teenager cleaning a minivan. She parked her car and approached him.

“Whatcha doing?” asked Ophira, pretending to chew on imaginary gum.

The youth slid the minivan door open. “I’m cleaning,” he said hesitantly.

“Didn’t your parents teach you not to talk to strangers? I could do all kinds of horrible things to you.”

They conversed for a bit. She recounted a story of her personal hero, a twelve-year-old girl who seduced and manipulated older men into ruin. She rehashed the mysteries of Mauricio’s girlfriend. She waxed poetic on her love of bouncy castles. The youth listened, hypnotized.

“What’s your name, kid?” Ophira yawned.

“Arjun,” he replied.

“How old are you?”

Arjun thought about it. “15,” he said.

“Hmm,” Ophira stroked her chin. “Can you sneak me into your room so that your parents never find out about it?”

Arjun’s eyes went wide.

MEANWHILE, on Captiva Island, Amit had learned that even though the Tenderly had multiple indoor jacuzzis, General Fitzpatrick and Mrs. Fitzpatrick had decided it prudent to have sex in the hot tub on the deck; that the execution of this plan had somehow necessitated a lengthy cleaning process before the hot tub could be used again; that that’s why workmen were cleaning the hot tub; and that the Fitzpatrick children had gotten General Fitzpatrick and Mrs. Fitzpatrick to agree to not do that again, with an added suggestion that they not be seen doing anything else naked in public.

A girl walked up to Amit. “Hey, I heard you lost your plus-one. Are you here alone? What a loser!” she giggled nervously, then stared.

“Leave me alone, Darlene,” sighed Amit.

Darlene’s face reddened as she spun on her heels and stormed over to Lisette. “Oh my god, did you see that? I practically threw myself at him and he was abusive toward me. He probably has all the classic signs of being an abuser. Did you hear about that girl he dated in Ohio? I bet I know why that ended.”

“Oh really?” said Lisette distractedly, looking Amit up and down. “So he’s single now?”

Darlene glared at Lisette as Amit wandered back outside to stare at the hot tub.

“Hey kid,” said Ophira, “bring me some snacks.”

“I don’t bring food into my room,” said Arjun. “It attracts pests.”

“Is that what your parents told you?” scoffed Ophira. “Don’t be such a wuss.”

Three minutes later, Ophira was finishing a bag of paprika puffs. “These are great, Arjun! Where do you get these?”

“My cousin sends them from Europe,” he explained.

“Now get me a diet soda.”

Amit strolled along the beach, then yelped. “What’s biting my legs?” he cried out.

“Those are sand fleas,” said Nessarose.

“What are sand fleas?” asked Amit incredulously.

Nessarose rolled her eyes. “Stop being a baby and have a drink.”

After the sun went down, Amit began to notice the crabs, and this made him drink more.

When everyone was soused, General Fitzpatrick announced that they were going for a swim in the Gulf, in direct contravention of safety guidelines. Most of the guests were wise enough to refuse, but an eightsome swam out, occasionally stopping to slap the algae, but continuing until they reached the sandbar that General Fitzpatrick correctly claimed was there.

Then screams echoed through the night as all the jellyfish attacked everyone invading their sandbar.

The crestfallen swimming party eventually made it back to shore.

“Pee on the jellyfish sting,” commanded Nessarose. “It’s the best cure.”

“No!” shouted General Fitzpatrick’s daughter. “Urine makes it worse.”

Things quickly escalated from Nessarose and General Fitzpatrick’s daughter screaming at each other to the beach dividing into three factions: those siding with Nessarose, those siding with General Fitzpatrick’s daughter, and those who had no idea what was going on. General Fitzpatrick had no interest in any of this, and went straight to bed.

“It’s getting late, kid,” said Ophira. “I’m taking your bed.”

“What?” squeaked Arjun.

“Look,” said Ophira, “your bed is small and there isn’t room for both of us. You may sleep on the floor if you’re quiet and don’t bother me.”

“What?” squeaked Arjun.

“Are you deaf, kid?” Ophira grunted and then went to bed.

Arjun blinked in confusion, then tried to fall asleep on the floor, without much success.

Ophira got up in the morning and said, “Before I go, I want to teach you a valuable lesson.”

“What?” groaned Arjun, getting to his feet.

“You should be careful talking to strangers. Now, I told you that I could do horrible things to you, so this is not my fault; it’s yours,” she announced, then sucker-punched him in the gut.

Ophira climbed out the window as Arjun doubled over.

As the ceremony began, only a small minority of the wedding party was visibly suffering from jellyfish stings, which may or may not have helped with ignoring the sand fleas.

The ceremony ended shortly thereafter, and now that marriage had been accomplished, everyone turned their attention to food and drink and swimming less irresponsibly than the night before. Guests that needed to return home sooner departed in waves and Amit started to appreciate the more peaceful environment.

He heard the deck door slide open behind him and turned his attention away from the hot tub.

“Hey, mofo,” Ophira shouted as strode stylishly out onto the deck. “Where’s this bouncy castle?”

Amit blinked in surprise. “That was yesterday. You missed it.”

“Oh,” she frowned. “So I met this South Slav guy with a really sexy forehead, and I need some advice. I don’t know if I should call him or wait.”

Amit pointed to the hot tub and told her the story of General Fitzpatrick and Mrs. Fitzpatrick and the hot tub.

“What?” said Ophira. “How could they have sex underwater?”

“What do you mean?” asked Amit.

“Well, it’s impossible,” she replied.

Posted on 2017-03-19
Tags: mintings


Planet DebianVincent Sanders: A rose by any other name would smell as sweet

Often I end up dealing with code that works but might not be of the highest quality. While quality is subjective I like to use the idea of "code smell" to convey what I mean, these are a list of indicators that, in total, help to identify code that might benefit from some improvement.

Such smells may include:
  • Complex code lacking comments on intended operation
  • Code lacking API documentation comments especially for interfaces used outside the local module
  • Not following style guide
  • Inconsistent style
  • Inconsistent indentation
  • Poorly structured code
  • Overly long functions
  • Excessive use of pre-processor
  • Many nested loops and control flow clauses
  • Excessive numbers of parameters
I am most certainly not alone in using this approach and Fowler et al have covered this subject in the literature much better than I can here. One point I will raise though is some programmers dismiss code that exhibits these traits as "legacy" and immediately suggest a fresh implementation. There are varying opinions on when a rewrite is the appropriate solution from never to always but in my experience making the old working code smell nice is almost always less effort and risk than a re-write.


When I come across smelly code, and I decide it is worthwhile improving it, I often discover the biggest smell is lack of test coverage. Now do remember this is just one code smell and on its own might not be indicative, my experience is smelly code seldom has effective test coverage while fresh code often does.

Test coverage is generally understood to be the percentage of source code lines and decision paths used when instrumented code is exercised by a set of tests. Like many metrics developer tools produce, "coverage percentage" is often misused by managers as a proxy for code quality. Both Fowler and Marick have written about this but sufficient to say that for a developer test coverage is a useful tool but should not be misapplied.

Although refactoring without tests is possible the chances for unintended consequences are proportionally higher. I often approach such a refactor by enumerating all the callers and constructing a description of the used interface beforehand and check that that interface is not broken by the refactor. At which point is is probably worth writing a unit test to automate the checks.

Because of this I have changed my approach to such refactoring to start by ensuring there is at least basic API code coverage. This may not yield the fashionable 85% coverage target but is useful and may be extended later if desired.

It is widely known and equally widely ignored that for maximum effectiveness unit tests must be run frequently and developers take action to rectify failures promptly. A test that is not being run or acted upon is a waste of resources both to implement and maintain which might be better spent elsewhere.

For projects I contribute to frequently I try to ensure that the CI system is running the coverage target, and hence the unit tests, which automatically ensures any test breaking changes will be highlighted promptly. I believe the slight extra overhead of executing the instrumented tests is repaid by having the coverage metrics available to the developers to aid in spotting areas with inadequate tests.


A short example will help illustrate my point. When a web browser receives an object over HTTP the server can supply a MIME type in a content-type header that helps the browser interpret the resource. However this meta-data is often problematic (sorry that should read "a misleading lie") so the actual content must be examined to get a better answer for the user. This is known as mime sniffing and of course there is a living specification.

The source code that provides this API (Linked to it rather than included for brevity) has a few smells:
  • Very few comments of any type
  • The API are not all well documented in its header
  • A lot of global context
  • Local static strings which should be in the global string table
  • Pre-processor use
  • Several long functions
  • Exposed API has many parameters
  • Exposed API uses complex objects
  • The git log shows the code has not been significantly updated since its implementation in 2011 but the spec has.
  • No test coverage
While some of these are obvious the non-use of the global string table and the API complexity needed detailed knowledge of the codebase, just to highlight how subjective the sniff test can be. There is also one huge air freshener in all of this which definitely comes from experience and that is the modules author. Their name at the top of this would ordinarily be cause for me to move on, but I needed an example!

First thing to check is the API use

$ git grep -i -e mimesniff_compute_effective_type --or -e mimesniff_init --or -e mimesniff_fini
content/hlcache.c: error = mimesniff_compute_effective_type(handle, NULL, 0,
content/hlcache.c: error = mimesniff_compute_effective_type(handle,
content/hlcache.c: error = mimesniff_compute_effective_type(handle,
content/mimesniff.c:nserror mimesniff_init(void)
content/mimesniff.c:void mimesniff_fini(void)
content/mimesniff.c:nserror mimesniff_compute_effective_type(llcache_handle *handle,
content/mimesniff.h:nserror mimesniff_compute_effective_type(struct llcache_handle *handle,
content/mimesniff.h:nserror mimesniff_init(void);
content/mimesniff.h:void mimesniff_fini(void);
desktop/netsurf.c: ret = mimesniff_init();
desktop/netsurf.c: mimesniff_fini();

This immediately shows me that this API is used in only a very small area, this is often not the case but the general approach still applies.

After a little investigation the usage is effectively that the mimesniff_init API must be called before the mimesniff_compute_effective_type API and the mimesniff_fini releases the initialised resources.

A simple test case was added to cover the API, this exercised the behaviour both when the init was called before the computation and not. Also some simple tests for a limited number of well behaved inputs.

By changing to using the global string table the initialisation and finalisation API can be removed altogether along with a large amount of global context and pre-processor macros. This single change removes a lot of smell from the module and raises test coverage both because the global string table already has good coverage and because there are now many fewer lines and conditionals to check in the mimesniff module.

I stopped the refactor at this point but were this more than an example I probably would have:
  • made the compute_effective_type interface simpler with fewer, simpler parameters
  • ensured a solid set of test inputs
  • examined using a fuzzer to get a better test corpus.
  • added documentation comments
  • updated the implementation to 2017 specification.


The approach examined here reduce the smell of code in an incremental, testable way to improve the codebase going forward. This is mainly necessary on larger complex codebases where technical debt and bit-rot are real issues that can quickly overwhelm a codebase if not kept in check.

This technique is subjective but helps a programmer to quantify and examine a piece of code in a structured fashion. However it is only a tool and should not be over applied nor used as a metric to proxy for code quality.

Planet Linux AustraliaDavid Rowe: Codec 2 700C and Short LDPC Codes

In the last blog post I evaluated FreeDV 700C over the air. This week I’ve been simulating the use of short LDPC FEC codes with Codec 2 700C over AWGN and HF channels.

In my HF Digital Voice work to date I have shied away from FEC:

  1. We didn’t have the bandwidth for the extra bits required for FEC.
  2. Modern, high performance codes tend to have large block sizes (1000’s of bits) which leads to large latency (several seconds) when applied to low bit rate speech.
  3. The error rates we are interested in (e.g. 10% raw, 1% after FEC decoder) are unusual – many codes don’t work well.

However with Codec 2 pushed down to 700 bit/s we now have enough bandwidth for a rate 1/2 code inside a standard 2kHz SSB channel. Over coffee a few weeks ago, Bill VK5DSP offered to develop some short LDPC codes for me specifically for this application. He sent me an Octave simulation of rate 1/2 and 2/3 codes of length 112 and 56 bits. Codec 2 700C has 28 bit frames so this corresponds to 4 or 2 Codec 2 700C frames, which would introduce a latencies of between 80 to 160ms – quite acceptable for Push To Talk (PTT) radio.

I re-factored Bill’s simulation code to produce ldpc_short.m. This measures BER and PER for Bill’s short LDPC codes, and also plots curves for theoretical, HF multipath channels, a Golay (24,12) code, and the current diversity scheme used in FreeDV 700C.

To check my results I compared the Golay BER and ideal HF multipath (Rayleigh Fading) channel curves to other peoples work. Always a good idea to spot check a few values and make sure they are sensible. I took a simple approach to get results in a reasonable amount of coding time (about 1 day of work in this case). This simulation runs at the symbol rate, and assumes ideal synchronisation. My other modem work (i.e experience) lets me move back and forth between this sort of simulation and real world modems, for example accounting for synchronisation losses.

Error Distribution and Packet Error Rate

I had an idea that Packet Error Rate (PER) might be important. Without FEC, bit errors are scattered randomly about. At our target 1% BER, many frames will have 1 or 2 bit errors. As discussed in the last post Codec 2 700C is sensitive to bit errors as “every bit counts”. For example one bit error in the Vector Quantiser (VQ) index (a big look up table) can throw the speech spectrum right off.

However a LDPC decoder will tend to correct all errors in a codeword, or “die trying” (i.e. fail badly). So an average output BER of say 1% will consist of a bunch of perfect frames, plus a completely trashed one every now and again. Digital voice works better with this style of error pattern than a few random errors in each codec packet. So for a given BER, a system that delivers a lower PER is better for our application. I’ve guesstimated a 10% PER target for intelligible low bit rate speech. Lets see how that works out…..


Here are the BER and PER curves for an AWGN channel:

Here are the same curves for HF (multipath fading) channel:

I’ve included a Golay (24,12) block code (hard decision) and uncoded PSK for comparison to the AWGN curves, and the diversity system on the HF curves. The HF channel is modelled as two paths with 1Hz Doppler spread and a 1ms delay.

The best LDPC code reaches the 1% BER/10% PER point at 2dB Eb/No (AWGN) and 6dB (HF multipath). Comparing BER, the coding gain is 2.5 and 3dB (AWGN and HF). Comparing PER, the coding gain is 3 and 5dB (AWGN and HF).

Here is a plot of the error pattern over time using the LDPC code on a HF channel at Eb/No of 6dB:

Note the errors are confined to short bursts – isolated packets where the decoder fails. Even though the average BER is 1%, most of the speech is error free. This is a very nice error distribution for digital speech.

Speech Samples

Here are some speech samples, comparing the current diversity scheme used for FreeDV 700C to LDPC, for AWGN and LDPC channels. These were simulated by extracting the error pattern from the simulation then inserting these errors in a Codec 2 700C bit stream (see command lines section below).

AWGN Eb/No 2dB Diversity LDPC
HF Eb/No 6dB Diversity LDPC

Next Steps

These results are very encouraging and suggest a gain of 2 to 5dB over FreeDV 700C, and better error distribution (lower PER). Next step is to develop FreeDV 700D – a real world implementation using the 112 data-bit rate 1/2 LDPC code. This will require 4 frames of buffering, and some sort of synchronisation to determine the 112 bit frame boundaries. Fortunately much of the C code for these LDPC codes already exists, as it was developed for the Wenet High Altitude Balloon work.

If most frames at the decoder input are now error free, we can consider more efficient (but less robust) techniques for Codec 2, such as prediction (delta coding). This will decrease the codec bit rate for a given speech quality. We could then choose to reduce our bit rate (making the system more robust for a given channel SNR), or raise speech quality while maintaining the same bit rate.

Command Lines

Generating the decoded speech, first run the Octave ldpc_short simulation to generate “error pattern file”, then subject the Codec 2 700C bit stream to these error patterns.

octave:67> ldpc_short
$ ./c2enc 700C ../../raw/ve9qrp_10s.raw - | ./insert_errors - - ../../octave/awgn_2dB_ldpc.err 28 | ./c2dec 700C - - | aplay -f S16_LE -

The simulation generate .eps files as direct generation of PNG leads to font size issues. Converting EPS to PNG without transparent background:

mogrify -resize 700x600 -density 300 -flatten -format png *.eps

However I still feel the images are a bit fuzzy, especially the text. Any ideas? Here’s the eps file if some one would like to try to get a nicer PNG conversion for me! The EPS file looks great at any scaling when I render it using the Ubuntu document viewer.

Update: A friend of mine (Erich) has suggested using GIMP for the conversion. This does seem to work well and has options for text and line anti-aliasing. It would be nice to be able to generate nice PNGs directly from Octave – my best approach so far is to capture screen shots.


LowSNR site Bill VK5DSP writes about his experiments in low SNR communications.

Wenet High Altitude Balloon SSDV System developed with Mark VK5QI and BIll VK5DSP that uses LDPC codes.

LPDC using Octave and the CML library

FreeDV 700C

Codec 2 700C

Planet Linux AustraliaOpenSTEM: St Patrick’s Day 2017 – and a free resource on Irish in Australia

Happy St Patrick’s day!

Slane AbbeyAnd “we have a resource on that” – that is, on the Irish in Australia and the major contributions they made since the very beginning of the colonies. You can get that lovely 5 page resource PDF for free if you check out using coupon code TRYARESOURCE. It’s an option we’ve recently put in place so anyone can grab one resource of their choice to see if they like our materials and assess their quality.

StPatric statue - Slane AbbeyView of Tara from Slane AbbeyBack to St Patrick, we were briefly in Ireland last year and near Dublin we drove past a ruin at the top of a hill that piqued our interest, so we stopped and had a look. It turned out to be Slane Abbey, the site where it is believed in 433 AD, the first Christian missionary to Ireland, later known as St Patrick, lit a large (Easter) celebration fire (on the Hill of Slane). With this action he (unwittingly?) contravened orders by King Laoghaire at nearby Tara. The landscape photo past the Celtic cross shows the view towards Tara. Ireland is a beautiful country, with a rich history.

Slane Abbey - info plaquePhotos by Arjen Lentz & Dr Claire Reeler


Krebs on SecurityGovt. Cybersecurity Contractor Hit in W-2 Phishing Scam

Just a friendly reminder that phishing scams which spoof the boss and request W-2 tax data on employees are intensifying as tax time nears. The latest victim shows that even cybersecurity experts can fall prey to these increasingly sophisticated attacks.

athookOn Thursday, March 16, the CEO of Defense Point Security, LLC — a Virginia company that bills itself as “the choice provider of cyber security services to the federal government” — told all employees that their W-2 tax data was handed directly to fraudsters after someone inside the company got caught in a phisher’s net.

Alexandria, Va.-based Defense Point Security (recently acquired by management consulting giant Accenture) informed current and former employees this week via email that all of the data from their annual W-2 tax forms — including name, Social Security Number, address, compensation, tax withholding amounts — were snared by a targeted spear phishing email.

“I want to alert you that a Defense Point Security (DPS) team member was the victim of a targeted spear phishing email that resulted in the external release of IRS W-2 Forms for individuals who DPS employed in 2016,” Defense Point CEO George McKenzie wrote in the email alert to employees. “Unfortunately, your W-2 was among those released outside of DPS.”

W-2 scams start with spear phishing emails usually directed at finance and HR personnel. The scam emails will spoof a request from the organization’s CEO (or someone similarly high up in the organization) and request all employee W-2 forms.

Defense Point did not return calls or emails seeking comment. An Accenture spokesperson issued the following brief statement:  “Data protection and our employees are top priorities. Our leadership and security team are providing support to all impacted employees.”

The email that went out to Defense Point employees Thursday does not detail when this incident occurred, to whom the information was sent, or how many employees were impacted. But a review of information about the company on LinkedIn suggests the breach letter likely was sent to around 200 to 300 employees nationwide (if we count past employees also).

Among Defense Point’s more sensitive projects is the U.S. Immigration and Customs Enforcement (ICE) Security Operations Center (SOC) based out of Phoenix, Ariz. That SOC handles cyber incident response, vulnerability mitigation, incident handling and cybersecurity policy enforcement for the agency.

Fraudsters who perpetrate tax refund fraud prize W-2 information because it contains virtually all of the data one would need to fraudulently file someone’s taxes and request a large refund in their name. Scammers in tax years past also have massively phished online payroll management account credentials used by corporate HR professionals. This year, they are going after people who run tax preparation firms, and W-2’s are now being openly sold in underground cybercrime stores.

Tax refund fraud affects hundreds of thousands, if not millions, of U.S. citizens annually. Victims usually first learn of the crime after having their returns rejected because scammers beat them to it. Even those who are not required to file a return can be victims of refund fraud, as can those who are not actually due a refund from the IRS.


I find it interesting that a company which obviously handles extremely sensitive data on a regular basis and one that manages a highly politicized government agency would not anticipate such attacks and deploy some kind of data-loss prevention (DLP) technology to stop sensitive information from leaving their networks.

Thanks to their mandate as an agency, ICE is likely a high risk target for hacktivists and nation-state hackers. This was not a breach in which data was exfiltrated through stealthy means; the tax data was sent by an employee openly through email. This suggests that either there were no DLP technical controls active in their email environment, or they were inadequately configured to prevent information in SSN format from leaving the network.

This incident also suggests that perhaps Defense Point does not train their employees adequately in information security, and yet they are trusted to maintain the security environment for a major government agency. This from a company that sells cybersecurity education and training as a service to others.


While there isn’t a great deal you can do to stop someone at your employer from falling for one of these W-2 phishing scams, here are some steps you can take to make it less likely that you will be the next victim of tax refund fraud:

-File before the fraudsters do it for you – Your primary defense against becoming the next victim is to file your taxes at the state and federal level as quickly as possible. Remember, it doesn’t matter whether or not the IRS owes you money: Thieves can still try to impersonate you and claim that they do, leaving you to sort out the mess with the IRS later.

-Get on a schedule to request a free copy of your credit report. By law, consumers are entitled to a free copy of their report from each of the major bureaus once a year. Put it on your calendar to request a copy of your file every three to four months, each time from a different credit bureau. Dispute any unauthorized or suspicious activity. This is where credit monitoring services are useful: Part of their service is to help you sort this out with the credit bureaus, so if you’re signed up for credit monitoring make them do the hard work for you.

-File form 14039 and request an IP PIN from the government. This form requires consumers to state they believe they’re likely to be victims of identity fraud. Even if thieves haven’t tried to file your taxes for you yet, virtually all Americans have been touched by incidents that could lead to ID theft — even if we just look at breaches announced in the past year alone.

Consider placing a “security freeze” on one’s credit files with the major credit bureaus. See this tutorial about why a security freeze — also known as a “credit freeze,” may be more effective than credit monitoring in blocking ID thieves from assuming your identity to open up new lines of credit. While it’s true that having a security freeze on your credit file won’t stop thieves from committing tax refund fraud in your name, it would stop them from fraudulently obtaining your IP PIN.

Monitor, then freeze. Take advantage of any free credit monitoring available to you, and then freeze your credit file with the four major bureaus. Instructions for doing that are here.

CryptogramFriday Squid Blogging: Squid Catches Down in Argentina

News from the South Atlantic:

While the outlook is good at present, it is too early to predict what the final balance of this season will be. The sector is totally aware that the 2016 harvest started well, but then it registered a strong decline.

Last year only 60,315 tonnes of Illex squid were landed, well below the 126,670 tonnes landed in 2015 and the 168,729 tonnes recorded in 2014.

As usual, you can also use this squid post to talk about the security stories in the news that I haven't covered.

Planet DebianShirish Agarwal: Science Day at GMRT, Khodad 2017

The whole team posing at the end of day 2

The above picture is the blend of the two communities from foss community and mozilla India. And unless you were there you wouldn’t know who is from which community which is what FOSS is all about. But as always I’m getting a bit ahead of myself.

Akshat, who works at NCRA as a programmer, the standing guy on the left shared with me in January this year that this year too, we should have two stalls, foss community and mozilla India stalls next to each other. While we had the banners, we were missing stickers and flyers. Funds were and are always an issue and this year too, it would have been emptier if we didn’t get some money saved from last year minidebconf 2016 that we had in Mumbai. Our major expenses included printing stickers, stationery and flyers which came to around INR 5000/- and couple of LCD TV monitors which came for around INR 2k/- as rent. All the labour was voluntary in nature, but both me and Akshat easily spending upto 100 hours before the event. Next year, we want to raise to around INR 10-15k so we can buy 1 or 2 LCD monitors and we don’t have to think for funds for next couple of years. How will we do that I have no idea atm.

Printing leaflets

Me and Akshat did all the printing and stationery runs and hence had not been using my lappy for about 3-4 days.

Come to the evening before the event and the laptop would not start. Coincidentally, or not few months or even last at last year’s Debconf people had commented on IBM/Lenovo’s obsession with proprietary power cords and adaptors. I hadn’t given it much thought but when I got no power even after putting it on AC power for 3-4 hours, I looked up on the web and saw that the power cord and power adaptors were all different even in T440 and even that under existing models. In fact I couldn’t find mine hence sharing it via pictures below.

thinkpad power cord male

thinkpad power adaptor female

I knew/suspected that thinkpads would be rare where I was going, it would be rarer still to find the exact power cord and I was unsure whether it was the power cord at fault or adaptor or whatever goes for SMPS in laptop or memory or motherboard/CPU itself. I did look up the documentation at and was surprised at the extensive documentation that Lenovo has for remote troubleshooting.

I did the usual take out the battery, put it back in, twiddle with the little hole in the bottom of the laptop, trying to switch on without the battery on AC mains, trying to switch on with battery power only but nothing worked. Couple of hours had gone by and with a resigned thought went to bed, convincing myself that anyways it’s good I am not taking the lappy as it is extra-dusty there and who needs a dead laptop anyways.

Update – After the event was over, I did contact Lenovo support and within a week, with one visit from a service engineer, he was able to identify that it was a faulty cable which was at fault and not the the other things which I was afraid of. Another week gone by and lenovo replaced the cable. Going by service standards that I have seen of other companies, Lenovo deserves a gold star here for the prompt service they provided. I probably would end up subscribing to their extended 2-year warranty service when my existing 3 year warranty is about to be over.

Next day, woke up early morning, two students from COEP hostel were volunteering and we made our way to NCRA, Pune University Campus. Ironically, though we were under the impression that we would be the late arrivals, it turned out we were the early birds. 5-10 minutes passed by and soon enough we were joined by Aniket and we played catch-up for a while. We hadn’t met each other for a while so it was good to catch-up. Then slowly other people starting coming in and around 07:10-07:15 we started for GMRT, Khodad.

Now I had been curious as had been hearing for years that the Pune-Nashik NH-50 highway would be concreted and widened to six-lane highways but the experience was below par. Came back and realized the proposal has now been pushed back to 2020.

From the mozilla team, only Aniket was with us, the rest of the group was coming straight from Nashik. Interestingly, all the six people who came, came on bikes which depending upon how you look at it was either brave or stupid. Travelling on bikes on Indian highways you either have to be brave or stupid or both, we have more than enough ‘accidents’ due to quality of road construction, road design, lane-changing drivers and many other issues. This is probably not the place for it hence will use some other blog post to rant about that.

We reached around 10:00 hrs. IST and hung around till lunch as Akshat had all the marketing material, monitors etc. The only thing we had were couple of lappies and couple of SBC’s, an RPI 3 and a BBB.

Aarti Kashyap sharing something about SBC

Our find for the event was Aarti Kashyap who you can see above. She is a third-year student at COEP and one of the rare people who chose to interact with hardware rather than software. From last several years, we had been trying, successfully and unsuccessfully to get more Indian women and girls interested into technology. It is a vicious circle as till a girl/woman doesn’t volunteer we are unable to share our knowledge to the extent we can which leads them to not have much interest in FOSS or even technology in general.

While there are groups are djangogirls, Pyladies and railgirls and even Outreachy which tries to motivate getting girls into computing but it’s a long road ahead.

We are short of both funds and ideas as to how to motivate more girls to get into computing and then to get into playing with hardware. I don’t know where to start and end for whoever wants to play with hardware. From SBC’s, routers to blade servers the sky is the limit. Again this probably isn’t the place for it, hence probably we can chew it on more at some other blog post.

This year, we had a lowish turnout due to the fact that the 12th board exams 1st paper was on the day we had opened. So instead of 20-25k, we probably had 5-7k fewer people pass through. There were two-three things that we were showing, we were showing Debian on one of the systems, we were showing the output from the SBC’s on the other monitor but the glare kept hitting the monitors.

While the organizers had done exemplary work over last year. They had taped the carpets on the ground so there was hardly any dust moving around. However, I wished the organizers had taken the pains to have two cloth roofs over our head instead of just one, the other roof head could be say 2 feet up, this would have done two things –

a. It probably would have cooled the place a bit more as –

b. We could get diffused sunlight which would have lessened the glare and reflection the LCD’s kept throwing back. At times we also got people to come to our side as can be seen in Aarti’s photo as can be seen above.

If these improvements can be made for next year, this would result in everybody in our ‘Pandal’ would benefit, not just us and mozilla. This would be benefiting around 10-15 organizations which were within the same temporary structure.

Of course, it depends very much on the budget they are able to have and people who are executing, we can just advise.

The other thing which had been missing last year and this year is writing about Single Board Computers in Marathi. If we are to promote them as something to replace a computer or something for a younger brother/sister to learn computing upon at a lower cost, we need leaflets written in their language to be more effective. And this needs to be in the language and mannerisms that people in that region understand. India, as probably people might have experienced is a dialect-prone country. Which means every 2-5 kms, the way the language is spoken is different from anywhere else. The Marathi spoken by somebody who has lived in Ravivar Peth for his whole life and a person who has lived in say Kothrud are different. The same goes from any place and this place, Khodad, Narayangaon would have its own dialect, its own mini-codespeak.

Just to share, we did have one in English but it would have been a vast improvement if we could do it in the local language. Maybe we can discuss about this and ask for help from people.

Outside, Looking in

Mozillians helping FOSS community and vice-versa

What had been interesting about the whole journey were the new people who were bringing all their passion and creativity to the fore. From the mozilla community, we had Akshay who is supposed to be a wizard on graphics, animation, editing anything to do with the visual medium. He shared some of the work he had done and also shared a bit about how blender works with people who wanted to learn about that.

Mayur, whom you see in the picture pointing out something about FOSS and this was the culture that we strove to have. I know and love and hate the browser but haven’t been able to fathom the recklessness that Mozilla has been doing the last few years, which has just been having one mis-adventure after another.

For instance, mozstumbler was an effort which I thought would go places. From what little I understood, it served/serves as a user-friendly interface to a potential user while still sharing all the data with OSM . They (Mozilla) seems/seemed to have a fatalistic take as it provided initial funding but then never fully committing to the project.

Later, at night we had the whole ‘free software’ and ‘open-source’ sharings where I tried to emphasize that without free software, the term ‘open-source’ would not have come into existence. We talked and talked and somewhere around 02:00 I slept, the next day was an extension of the first day itself where we ribbed each other good-naturedly and still shared whatever we could share with each other.

I do hope that we continue this tradition for great many years to come and engage with more and more people every passing year.

Filed under: Miscellenous Tagged: #budget, #COEP< #volunteering, #debian, #Events, #Expenses, #mozstumbler, #printing, #SBC's, #Science Day 2017, #thinkpad cable issue, FOSS, mozilla

LongNowFrank Ostaseski Seminar Tickets


The Long Now Foundation’s monthly

Seminars About Long-term Thinking

Frank Ostaseski on What the Dying Teach the Living

Frank Ostaseski on “What the Dying Teach the Living”


Monday April 10, 02017 at 7:30pm SFJAZZ Center

Long Now Members can reserve 2 seats, join today! General Tickets $15


About this Seminar:

Frank Ostaseski is a Buddhist teacher, lecturer and author, whose focus is on contemplative end-of-life care. His new book, The Five Invitations: Discovering What Death Can Teach Us About Living Fully, will be released in March 02017.


Sociological ImagesDelusions of Dimorphism

Flashback Friday.

Add to the list of new books to read Delusions of Gender: How Our Minds, Society, and Neurosexism Create Difference, by Cordelia Fine. Feeding my interest in the issue of sexual dimorphism in humans — which we work so hard to teach to children — the book is described like this:

Drawing on the latest research in neuroscience and psychology, Cordelia Fine debunks the myth of hardwired differences between men’s and women’s brains, unraveling the evidence behind such claims as men’s brains aren’t wired for empathy and women’s brains aren’t made to fix cars.

Good reviews here and here report that Fine tackles an often-cited study of newborn infants’ sex difference in preferences for staring at things, by Jennifer Connellan and colleagues in 2000. They reported:

…we have demonstrated that at 1 day old, human neonates demonstrate sexual dimorphism in both social and mechanical perception. Male infants show a stronger interest in mechanical objects, while female infants show a stronger interest in the face.

And this led to the conclusion: “The results of this research clearly demonstrate that sex differences are in part biological in origin.” They reached this conclusion by alternately placing Connellan herself or a dangling mobile in front of tiny babies, and timing how long they stared. There is a very nice summary of problems with the study here, which seriously undermine its conclusion.

However, even if the methods were good, this is a powerful example of how a tendency toward difference between males and females is turned into a categorical opposition between the sexes — as in, the “real differences between boys and girls.”

To illustrate this, here’s a graphic look at the results in the article, which were reported in this table:

They didn’t report the whole distribution of boys’ and girls’ gaze-times, but it’s obvious that there is a huge overlap in the distributions, despite a difference in the means. In the mobile-gaze-time, for example, the difference in averages is 9.7 seconds, while the standard deviations are more than 20 seconds. If I turn to my handy normal curve spreadsheet template, and fit it with these numbers, you can see what the pattern might look like (I truncate these at 0 seconds and 70 seconds, as they did in the study):

Source: My simulation assuming normal distributions from the data in the table above.

All I’m trying to say is that the sexes aren’t opposites, even if they have some differences that precede socialization.

If you could show me that the 1-day-olds who stare at the mobiles for 52 seconds are more likely to be engineers when they grow up than the ones who stare at them for 41 seconds (regardless of their gender) then I would be impressed. But absent that, if you just want to use such amorphous differences at birth to explain actual segregation among real adults, then I would not be impressed.

Originally posted in September, 2010.

Philip N. Cohen is a professor of sociology at the University of Maryland, College Park. He writes the blog Family Inequality and is the author of The Family: Diversity, Inequality, and Social Change. You can follow him on Twitter or Facebook.

(View original at

Worse Than FailureError'd: Nothing to Lose

"With fraud protection like this, I feel very safe using my card everywhere," Brad W. writes.


"Well if so many other people are buying them - they must be good right?" writes Andy H.


David A. wrote, "Attempting to use the Asus WinFlash utility to update the BIOS on my Asus laptop left me confused and in doubt."


Mathias S. wrote, "You know, I think I'll just play it safe and just go with 'Show notifications for '%1!u! minutes'"


"To me, seeing three overlaid spinners suggests you'll be waiting for a while," writes Daniel C.


Quentin G. wrote, "If only this was the first one of these that wasn't supposed to show up I wouldn't have submitted it, but crap, after the third one in a row they deserve to be shamed."


"Ok, so the error is pretty obvious, but what gets me is that while they couldn't be bothered to fix the actual bug, but cared enough to put up that warning," writes Jamie.


[Advertisement] Manage IT infrastructure as code across all environments with Puppet. Puppet Enterprise now offers more control and insight, with role-based access control, activity logging and all-new Puppet Apps. Start your free trial today!

Planet DebianAntonio Terceiro: Patterns for Testing Debian Packages

At the and of 2016 I had the pleasure to attend the 11th Latin American Conference on Pattern Languages of Programs, a.k.a SugarLoaf PLoP. PLoP is a series of conferences on Patterns (as in “Design Patterns”), a subject that I appreciate a lot. Each of the PLoP conferences but the original main “big” conference has a funny name. SugarLoaf PLoP is called that way because its very first edition was held in Rio de Janeiro, so the organizers named it after a very famous mountain in Rio. The name stuck even though a long time has passed since it was held in Rio for the last time. 2016 was actually the first time SugarLoaf PLoP was held outside of Brazil, finally justifying the “Latin American” part of its name.

I was presenting a paper I wrote on patterns for testing Debian packages. The Debian project funded my travel expenses through the generous donations of its supporters. PLoP’s are very fun conferences with a relaxed atmosphere, and is amazing how many smart (and interesting!) people gather together for them.

My paper is titled “Patterns for Writing As-Installed Tests for Debian Packages”, and has the following abstract:

Large software ecosystems, such as GNU/Linux distributions, demand a large amount of effort to make sure all of its components work correctly invidually, and also integrate correctly with each other to form a coherent system. Automated Quality Assurance techniques can prevent issues from reaching end users. This paper presents a pattern language originated in the Debian project for automated software testing in production-like environments. Such environments are closer in similarity to the environment where software will be actually deployed and used, as opposed to the development environment under which developers and regular Continuous Integration mechanisms usually test software products. The pattern language covers the handling of issues arising from the difference between development and production-like environments, as well as solutions for writing new, exclusive tests for as-installed functional tests. Even though the patterns are documented here in the context of the Debian project, they can also be generalized to other contexts.

In practical terms, the paper documents a set of patterns I have noticed in the last few years, when I have been pushing the Debian Continous Integration project. It should be an interesting read for people interested in the testing of Debian packages in their installed form, as done with autopkgtest. It should also be useful for people from other distributions interested in the subject, as the issues are not really Debian-specific.

I have recently finished the final version of the paper, which should be published in the ACM Digital Library at any point now. You can download a copy of the paper in PDF. Source is also available, if you are into markdown, LaTeX, makefiles and this sort of thing.

If everything goes according to plan, I should be presenting a talk on this at the next Debconf in Montreal.

Krebs on SecurityGoogle Points to Another POS Vendor Breach

For the second time in the past nine months, Google has inadvertently but nonetheless correctly helped to identify the source of a large credit card breach — by assigning a “This site may be hacked” warning beneath the search results for the Web site of a victimized merchant.

A little over a month ago, KrebsOnSecurity was contacted by multiple financial institutions whose anti-fraud teams were trying to trace the source of a great deal of fraud on cards that were all used at a handful of high-end restaurants around the country.

Two of those fraud teams shared a list of restaurants that all affected cardholders had visited recently. A bit of searching online showed that nearly all of those establishments were run by Select Restaurants Inc., a Cleveland, Ohio company that owns a number of well-known eateries nationwide, including Boston’s Top of the Hub; Parker’s Lighthouse in Long Beach, Calif.; the Rusty Scupper in Baltimore, Md.; Parkers Blue Ash Tavern in Cincinnati, Ohio; Parkers’ Restaurant & Bar in Downers Grove, Illinois; Winberie’s Restaurant & Bar with locations in Oak Park, Illinois and Princeton and Summit, New Jersey; and Black Powder Tavern in Valley Forge, PA.

Google's search listing for Select Restaurants, which indicates Google thinks this site may be hacked.

Google’s search listing for Select Restaurants, which indicates Google thinks this site may be hacked.

Knowing very little about this company at the time, I ran a Google search for it and noticed that Google believes the site may be hacked (it still carries this message). This generally means some portion of the site was compromised by scammers who are trying to abuse the site’s search engine rankings to beef up the rankings for “spammy” sites — such as those peddling counterfeit prescription drugs and designer handbags.

The “This site may be hacked” advisory is not quite as dire as Google’s “This site may harm your computer” warning — the latter usually means the site is actively trying to foist malware on the visitor’s computer. But in my experience it’s never a good sign when a business that accepts credit cards has one of these warnings attached to its search engine results.

Case in point: I experienced this exact scenario last summer as I was reporting out the details on the breach at CiCi’s Pizza chain. In researching that story, all signs were pointing to a point-of-sale (POS) terminal provider called Datapoint POS. Just like it did with Select Restaurants’s site, Google reported that Datapoint’s site appeared to be hacked.

Google thinks Datapoint's Web site is trying to foist malicious software.

Google believed Datapoint’s Web site was hacked.

Select Restaurants did not return messages seeking comment. But as with the breach at Cici’s Pizza chains, the breach involving Select Restaurant locations mentioned above appears to have been the result of an intrusion at the company’s POS vendor — Geneva, Ill. based 24×7 Hospitality Technology. 24×7 handles credit and debit card transactions for thousands of hotels and restaurants.

On Feb. 14, 24×7 Hospitality sent a letter to customers warning that its systems recently were hacked by a “sophisticated network intrusion through a remote access application.” Translation: Someone guessed or phished the password that we use to remotely administer point-of-sale systems at its customer locations. 24×7 said the attackers subsequently executed the PoSeidon malware variant, which is designed to siphon card data when cashiers swipe credit cards at an infected cash register (for more on PoSeidon, check out POS Providers Feel Brunt of PoSeidon Malware).

KrebsOnSecurity obtained a copy of the letter (PDF) that 24×7 Hospitality CEO Todd Baker, Jr. sent to Select Restaurants. That missive said even though the intruders apparently had access to all of 24×7 customers’ payment systems, not all of those systems were logged into by the hackers. Alas, this was probably little consolation for Select Restaurants, because the letter then goes on to say that the breach involves all of the restaurants listed on Select’s Web site, and that the breach appears to have extended from late October 2016 to mid-January 2017.


From my perspective, organized crime gangs have so completely overrun the hospitality and restaurant point-of-sale systems here in the United States that I just assume my card may very well be compromised whenever I use it at a restaurant or hotel bar/eatery. I’ve received no fewer than three new credit cards over the past year, and I’d wager that in at least one of those cases I happened to have used the card at multiple merchants whose POS systems were hacked at the same time.

But no matter how many times I see it, it’s fascinating to watch this slow motion train wreck play out. Given how much risk and responsibility for protecting against these types of hacking incidents is spread so thinly across the entire industry, it’s little wonder that organized crime gangs have been picking off POS providers for Tier 3 and Tier 4 merchants with PoSeidon en masse in recent years.

I believe one big reason we keep seeing the restaurant and hospitality industry being taken to the cleaners by credit card thieves is that in virtually all of these incidents, the retailer or restaurant has no direct relationships to the banks which have issued the cards that will be run through their hacked POS systems. Rather, these small Tier 3 and Tier 4 merchants are usually buying merchant services off of a local systems integrator who often is in turn reselling access to a third-party payment processing company.

As a result, very often when these small chains or solitary restaurants get hit with PoSeidon, there is no record of a breach that is simple to follow from the breached merchant back to the bank which issued the cards used at those compromised merchants. It is only by numerous financial institutions experiencing fraud from the same restaurants and then comparing notes about possible POS vendors in common among these restaurants that banks and credit unions start to gain a clue about what’s happening and who exactly has been hacked.

But this takes a great deal of time, effort and trust. Meanwhile, the crooks are laughing all the way to the bank. Another reason I find all this fascinating is that the two main underground cybercrime shops that appear to be principally responsible for offloading cards stolen in these Tier 3 and Tier 4 merchant breaches involving PoSeidon — stores like Rescator and Briansdump — both abuse my likeness in their advertisements and on their home pages. Here’s Briansdump:

An advertisement for the carding shop “briansdump[dot]ru” promotes “dumps from the legendary Brian Krebs.” Needless to say, this is not an endorsed site.

An advertisement for the carding shop “briansdump[dot]ru” promotes “dumps from the legendary Brian Krebs.” Needless to say, this is not an endorsed site.

Here’s the login page for the rather large stolen credit card bazaar known as Rescator:

The login page for Rescator, a major seller of credit and debit cards stolen in countless attacks targeting retailers, restaurants and hotels.

The login page for Rescator, a major seller of credit and debit cards stolen in countless attacks targeting retailers, restaurants and hotels.

Point-of-sale malware has driven most of the major retail industry credit card breaches over the past two years, including intrusions at Target and Home Depot, as well as breaches at a ridiculous number of point-of-sale vendors. The malware sometimes is installed via hacked remote administration tools like LogMeIn; in other cases the malware is relayed via “spear-phishing” attacks that target company employees. Once the attackers have their malware loaded onto the point-of-sale devices, they can remotely capture data from each card swiped at that cash register.

Thieves can then sell that data to crooks who specialize in encoding the stolen data onto any card with a magnetic stripe, and using the cards to purchase high-priced electronics and gift cards from big-box stores like Target and Best Buy.

Readers should remember that they’re not liable for fraudulent charges on their credit or debit cards, but they still have to report the unauthorized transactions. There is no substitute for keeping a close eye on your card statements. Also, consider using credit cards instead of debit cards; having your checking account emptied of cash while your bank sorts out the situation can be a hassle and lead to secondary problems (bounced checks, for instance).

Finally, if your credit card is compromised, try not to lose sleep over it: The chances of your finding out how that card was compromised are extremely low. This story seeks to explain why.

Update: March 18, 2:52 p.m. ET: An earlier version of this story referenced Buffalo Wild Wings as a customer of 24×7 Hospitality, as stated on 24×7’s site in a many places (PDF). Buffalo Wild Wings wrote in to say that it does not use the specific POS systems that were attacked, and that it is asking 24×7 to remove their brand and logo from the site.



Planet DebianThorsten Glaser: Updates to the last two posts

Someone from the FSF’s licencing department posted an official-looking thing saying they don’t believe GitHub’s new ToS to be problematic with copyleft. Well, my lawyer (not my personal one, nor for The MirOS Project, but related to another association, informally) does agree with my reading of the new ToS, and I can point out at least a clause in the GPLv1 (I really don’t have time right now) which says contrary (but does this mean the FSF generally waives the restrictions of the GPL for anything on GitHub?). I’ll eMail GitHub Legal directly and will try to continue getting this fixed (as soon as I have enough time for it) as I’ll otherwise be forced to force GitHub to remove stuff from me (but with someone else as original author) under GPL, such as… tinyirc and e3.

My dbconfig-common Debian packaging example got a rather hefty upgrade because dbconfig-common (unlike any other DB schema framework I know of) doesn’t apply the upgrades on a fresh install (and doesn’t automatically put the upgrades into a transaction either) but only upgrades between Debian package versions (which can be funny with backports, but AFAICT that part is handled correctly). I now append the upgrades to the initial-version-as-seen-in-the-source to generate the initial-version-as-shipped-in-the-binary-package (optionally, only if it’s named .in) removing all transaction stuff from the upgrade files and wrapping the whole shit in BEGIN; and COMMIT; after merging. (This should at least not break nōn-PostgreSQL databases and… well, database-like-ish things I cannot test for obvious (SQLite is illegal, at least in Germany, but potentially worldwide, and then PostgreSQL is the only remaining Open Source database left ;) reasons.)

Update: Yes, this does mean that maintainers of databases and webservers should send me patches to make this work with not-PostgreSQL (new install/, upgrade files) and not-Apache-2.2/2.4 (new debian/*/*.conf snippets) to make this packaging example even more generally usable.

Natureshadow already forked this and made a Python/Flask package from it, so I’ll prod him to provide a similarily versatile hello-python-world example package.

Planet DebianJoey Hess: end of an era

I'm at home downloading hundreds of megabytes of stuff. This is the first time I've been in position of "at home" + "reasonably fast internet" since I moved here in 2012. It's weird!

Satellite internet dish with solar panels in foreground

While I was renting here, I didn't mind dialup much. In a way it helps to focus the mind and build interesting stuff. But since I bought the house, the prospect of only dialup at home ongoing became more painful.

While I hope to get on the fiber line that's only a few miles away eventually, I have not convinced that ISP to build out to me yet. Not enough neighbors. So, satellite internet for now.

9.1 dB SNR

speedtest results: 15 megabit down / 4.5 up with significant variation

Dish seems well aligned, speed varies a lot, but is easily hundreds of times faster than dialup. Latency is 2x dialup.

The equipment uses more power than my laptop, so with the current solar panels, I anticipate using it only 6-9 months of the year. So I may be back to dialup most days come winter, until I get around to adding more PV capacity.

It seems very cool that my house can capture sunlight and use it to beam signals 20 thousand miles into space. Who knows, perhaps there will even be running water one day.

Satellite dish

LongNowThe Other 10,000 Year Project: Long-Term Thinking and Nuclear Waste

With half-lives ranging from 30 to 24,000, or even 16 million years , the radioactive elements in nuclear waste defy our typical operating time frames. The questions around nuclear waste storage — how to keep it safe from those who might wish to weaponize it, where to store it, by what methods, for how long, and with what markings, if any, to warn humans who might stumble upon it thousands of years in the future—require long-term thinking.

The Yucca Mountain nuclear waste repository was set to open on March 21, 02017, but has been indefinitely delayed / via High Country News

I. “A Clear and Present Danger.”

“For anyone living in SOCAL, San Onofre nuclear waste is slated to be buried right underneath the sands,” tweeted @JoseTCastaneda3 in February 02017. “Can we say ‘Fukushima #2’ yet?”

The “San Onofre” the user was referring to is the San Onofre nuclear plant in San Diego County, California, which sits on scenic bluffs overlooking the Pacific Ocean and sands dotted with surfers and beach umbrellas. Once a provider of eighteen percent of Southern California’s energy demands, San Onofre is in the midst of a 20-year, $4.4 billion demolition project following the failure of replacement steam generators in 02013. At the time, Senator Barbara Boxer said San Ofore was “unsafe and posed a danger to the eight million people living within fifty miles of the plant,” and opened a criminal investigation.

A part of the demolition involved figuring out what to do with the plant’s millions of pounds of high-level waste (the “spent fuel” leftover after uranium is processed) that simmered on-site in nuclear pools.  It was decided that the nuclear waste would be transported a few hundred yards to the beach, where it would be buried underground in what local residents have taken to calling the “concrete monolith” – a state of the art dry cask storage container that will house 75 concrete-sealed tubes of San Onofre’s nuclear waste until 2049.

This has left a lot of San Diego County residents unhappy.

The San Onofre Nuclear Generating Station seen from San Onofre State Beach in San Clemente / via Jeff Gritchen, OC Register

“We held a sacred water ceremony today @ San Onofre where 3.6mm lbs of nuclear waste are being buried on the beach near the San Andreas faultline,” tweeted Gloria Garrett, hinting at a nuclear calamity to come.

Congressman Darrell Issa, who represents the district of the decommissioned plant and introduced a bill in February 02017 to relocate the waste from San Onofre, was concerned about the bottom line.

“It’s just located on the edge of an ocean and one of the busiest highways in America,” Issa said in an interview with the San Diego Tribune. “We’ll be paying for storage for decades and decades if we don’t find a solution. And that will be added to your electricity bill.”

“The issue of what to do with nuclear waste is a clear and present danger to every human life within 100 miles of San Onofre,” said Charles Langley of the activist group Public Watchdogs.

“Everyone is whistling past the graveyard, including our regulators,” Langley continued. “They are storing nuclear waste that is deadly to humans for 10,000 generations in containers that are only guaranteed to last 25 years.”

II. The Nuclear Waste Stalemate

Nobody wants a nuclear waste storage dump in their backyards.

That is, in essence, the story of America’s pursuit of nuclear energy as a source of electricity for the last sixty years.

In 01957, the first American commercial nuclear reactor opened in the United States.  That same year, the National Academy of Sciences (NAS) recommended that spent fuel should be transported from reactors and buried deep underground. Those recommendations went largely unheeded until the Three Mile Island meltdown of March 01979, when 40,000 gallons of radioactive wastewater from the reactor poured into Pennsylvania’s Susquehanna River.

The political challenge of convincing any jurisdiction to store nuclear waste for thousands of years has vexed lawmakers ever since. As Marcus Stroud put it in his in-depth 02012 investigative feature into the history of nuclear waste storage in the United States:

Though every presidential administration since Eisenhower’s has touted nuclear power as integral to energy policy (and decreased reliance on foreign oil), none has resolved the nuclear waste problem. The impasse has not only allowed tens of thousands of tons of radioactive waste to languish in blocks of concrete behind chain link fences near major cities. It has contributed to a declining nuclear industry, as California, Wisconsin, West Virginia, Oregon, and other states have imposed moratoriums against new power plants until a waste repository exists. Disasters at Fukushima, Chernobyl, and Three Mile Island have made it very difficult, expensive, and time-consuming to build a nuclear reactor because of insurance premiums and strict regulations, and the nuclear waste stalemate has added significantly to the difficulties and expenses. Only two new nuclear power plants have received licenses to operate in the last 30 years.

Yucca Mountain was designated as the site for a national repository of nuclear waste in the Nuclear Waste Act of 01987. It was to be a deep geological repository for permanently sealing off and storing all of the nation’s nuclear waste, one that would require feats of engineering and billions of dollars to build. Construction began in the 01990s.  The repository was scheduled to open and begin accepting waste on March 21, 02017.

But pushback from Nevadans, who worried about long-term radiation risks and felt that it was unfair to store nuclear waste in a state that has no nuclear reactors, left the project defunded and on indefinite hiatus since 02011.

Today, nuclear power provides twenty percent of America’s electricity, producing almost 70,000 tons of waste a year. Most of the 121 nuclear sites in the United States opt for the San Onofre route, storing waste on-site in dry casks made of steel and concrete as they wait for the Department of Energy to choose a new repository.

III. Opening Ourselves to Deep Time

“We must have the backbone to look these enormous spans of time in the eye. We must have the courage to accept our responsibility as our planet’s – and our descendants’ – caretakers, millennium in and millennium out, without cowering before the magnitude of our challenge.” —Vincent Ialenti

An aerial view of Posiva Oy’s prospective nuclear waste repository site in Olkiluoto, Finland / via Posiva Oy

Anthropologist Vincent Ialenti  recently spent two years doing field work with a Finnish team of experts who were in the process of researching the Onkalo long term geological repository in Western Finland that, like Yucca Mountain, would store all of the Finland’s nuclear waste. The Safety Case project, as it was called, required experts to think in deep time about the myriad of factors (geological, ecological, and climatological) that might affect the site as it stored waste for thousands of years.

Ialenti’s goal was to examine how these experts conceived of the future:

What sort of scientific ethos, I wondered, do Safety Case experts adopt in their daily dealings with seemingly unimaginable spans of time? Has their work affected how they understand the world and humanity’s place within it? If so, how? If not, why not?

In the process, Ialenti found that his engagement with problems of deep time (“At what pace will Finland’s shoreline continue expanding outward into the Baltic Sea? How will human and animal populations’ habits change? What happens if forest fires, soil erosion or floods occur? How and where will lakes, rivers and forests sprout up, shrink and grow? What role will climate change play in all this?”) changed the way he conceived of the world around him, the stillness and serenity of  the landscapes transforming into a “Finland in flux”:

I imagined the enormous Ice Age ice sheet that, 20,000 years ago, covered the land below. I imagined Finland decompressing when this enormous ice sheet later receded — its shorelines extending outward as Finland’s elevation rose ever higher above sea level. I imagined coastal areas of Finland emerging from the ice around 10,000 BC. I imagined lakes, rivers, forests and human settlements sprouting up, disappearing and changing shape and size over the millennia.

Ialenti’s field work convinced him of the necessity of long-term thinking in the Anthropocene, and that engaging with the problem of nuclear waste storage, unlikely though it may seem, is a useful way of inspiring it:

Many suggest we have entered the Anthropocene — a new geologic epoch ushered in by humanity’s own transformations of Earth’s climate, erosion patterns, extinctions, atmosphere and rock record. In such circumstances, we are challenged to adopt new ways of living, thinking and understanding our relationships with our planetary environment. To do so, anthropologist Richard Irvine has argued, we must first “be open to deep time.” We must, as Stewart Brand has urged, inhabit a longer “now.”

So, I wonder: Could it be that nuclear waste repository projects — long approached by environmentalists and critical intellectuals with skepticism — are developing among the best tools for re-thinking humanity’s place within the deeper history of our environment? Could opening ourselves to deep, geologic, planetary timescales inspire positive change in our ways of living on a damaged planet?

IV. How Long is Too Long?

Finland’s Onkalo repository for nuclear waste / via Remon

Finland’s Onkalo Repository  is designed to last for 100,000 years. In the 01990s, the U.S. Environmental Protection Agency decided that a 10,000 year-time span was how long a U.S. nuclear waste storage facility must remain sealed off, basing their decision in part on the predicted frequencies of ice ages.

But as Stroud reports, it was basically guesswork:

Later, [the 10,000-year EPA standard] was increased to a million years by the U.S. Court of Appeals in part due to the long half lives of certain radioactive isotopes and in part due to a significantly less conservative guess.

The increase in time from 10,000 years to 1 million years made the volcanic cones at Yucca look less stable and million-year-old salt deposits — like those found in New Mexico — more applicable to the nuclear waste problem.

[The Department of Energy] hired anthropologists to study the history of language—both at Yucca and at the WIPP site in New Mexico—to conceive of a way to communicate far into the future that waste buried underground was not to be disturbed.

But the Blue Ribbon Commission’s report [of 02012] calls these abstract time periods a little impractical.

“Many individuals have told [BRC] that it is unrealistic to have a very long (e.g., million-year) requirement,” it reads. “[BRC] agrees.”
It then points out that other countries “have opted for shorter timeframes (a few thousand to 100,000 years), some have developed different kinds of criteria for different timeframes, and some have avoided the use of a hard ‘cut-off’ altogether.” The conclusion? “In doing so, [these countries] acknowledge the fact that uncertainties in predicting geologic processes, and therefore the behavior of the waste in the repository, increase with time.”

Public Law 102-579, 106, Statute 4777 calls for nuclear waste to be stored for at least 10,000 years / via EPA

In a spirited 02006 Long Now debate between Global Business Network co-founder and Long Now board member Peter Schwartz and Ralph Cavanagh of the Nuclear Resources Defense Council, Cavanagh pressed Schwartz on the problem of nuclear waste storage.

Schwartz contended that we’ve defined the nuclear waste problem incorrectly, and that reframing the time scale associated with storage, coupled with new technologies, would ease concerns among those who take it on:

The problem of nuclear waste isn’t a problem of storage for a thousand years or a million years. The issue is storing it long enough so we can put it in a form where we can reprocess it and recycle it, and that form is probably surface storage in very strong caskets in relatively few sites, i.e., not at every reactor, and also not at one single national repository, but at several sites throughout the world with it in mind that you are not putting waste in the ground forever where it could migrate and leak and raise all the concerns that people rightly have about nuclear waste storage. By redesigning the way in which you manage the waste, you’d change the nature of the challenge fundamentally.

Schwartz and other advocates of recycling spent fuel have discussed new pyrometallurgical technologies for reprocessing that could make nuclear power “truly sustainable and essentially inexhaustible.” These emerging pyro-processes, coupled with faster nuclear reactors, can capture upwards of 100 times more of the energy and produce little to no plutonium, thereby easing concerns that the waste could be weaponized.  Recycling spent fuel would  vastly reduce the amount of high-level waste, as well as the length of time that the waste must be isolated. (The Argonne National Laboratory believes its pyrochemical processing methods can drop the time needed to isolate waste from 300,000 years to 300 years).

There’s just one problem: the U.S. currently does not reprocess or recycle its spent fuel. President Jimmy Carter banned the commercial reprocessing of nuclear waste in 01977 over concerns that the plutonium in spent fuel could be extracted to produce nuclear weapons. Though President Reagan lifted the ban in 01981, the federal government has for the most part declined to provide subsidies for commercial reprocessing, and subsequent administrations have spoken out against it. Today, the “ban” effectively remains in place.

Inside Onkalo / via Posiva Oy

When the ban was first issued, the U.S. expected other nuclear nations like Great Britain and France to follow suit. They did not. Today, France generates eighty percent of its electricity from nuclear power, with much of that energy coming from reprocessing and recycling spent fuel. Japan and the U.K reprocess their fuel, and China and India are modeling their reactors on France’s reprocessing program. The United States, on the other hand, uses less than five percent of its nuclear fuel, storing the rest as waste.

In a 02015 op-ed for Forbes, William F. Shughart, research director for the Independent Institute in Oakland, California, argued that we must lift the nuclear recycling “ban” and take full advantage our nuclear capacity if we wish to adequately address the threats posed by climate change:

Disposing of “used” fuel in a deep-geologic repository as if it were worthless waste – and not a valuable resource for clean-energy production – is folly.

Twelve states have banned the construction of nuclear plants until the waste problem is resolved. But there is no enthusiasm for building the proposed waste depository. In fact, the Obama administration pulled the plug on the one high-level waste depository that was under construction at Nevada’s Yucca Mountain.

The outlook might be different if Congress were to lift the ban on nuclear-fuel recycling, which would cut the amount of waste requiring disposal by more than half. Instead of requiring a political consensus on multiple repository sites to store nuclear plant waste, one facility would be sufficient, reducing disposal costs by billions of dollars.

By lifting the ban on spent fuel recycling we could make use of a valuable resource, provide an answer to the nuclear waste problem, open the way for a new generation of nuclear plants to meet America’s growing electricity needs, and put the United States in a leadership position on climate-change action.

According to Stroud, critics of nuclear processing cite its cost (a Japanese government report from 02004 found reprocessing to be four times as costly as non-reprocessed nuclear power); the current abundance of uranium (Stroud says most experts agree that “if the world’s needs quadrupled today, uranium wouldn’t run out for another eighty years”); the fact that while reprocessing produces less waste, it still wouldn’t eliminate the need for a site to store it; and finally, the risk of spent fuel being used to make nuclear weapons.

Shughart, along with Schwartz and many others in the nuclear industry, feels the fears of nuclear proliferation from reprocessing are overblown:

The reality is that no nuclear materials ever have been obtained from the spent fuel of a nuclear power plant, owing both to the substantial cost and technical difficulty of doing so and because of effective oversight by the national governments and the International Atomic Energy Agency.

V. Curiosity Kills the Ray Cat

Whether we ultimately decide to store spent fuel for 10,000 years in a sealed off repository deep underground or for 300 years in above-ground casks, there’s still the question of how to effectively mark nuclear waste to warn future generations who might stumble upon it. The languages we speak now might not be spoken in the future, so the written word must be cast aside in favor of “nuclear semiotics” whose symbols stand the test of time.

After the U.S. Department of Energy assembled a task force of anthropologists and linguists to tackle the problem in 01981, French author Françoise Bastide and Italian semiologist Paolo Fabbri proposed an intriguing solution: ray cats.

Artist rendering of ray cats / via Aeon

Imagine a cat bred to turn green when near radioactive material. That is, in essence, the ray cat solution.

“[Their] role as a detector of radiation should be anchored in cultural tradition by introducing a suitable name (eg, ‘ray cat’)” Bastide and Fabbri wrote at the time.

The idea has recently been revived. The Ray Cat Movement was established in 02015 to “insert ray cats into the cultural vocabulary.”

Alexander Rose, Executive Director at Long Now who has visited several of the proposed nuclear waste sites, suggests however that solutions like the ray cats only address part of the problem.

“Ray cats are cute, but the solution doesn’t promote a myth that can be passed down for generations,” he said. “The problem isn’t detection technology. The problem is how you create a myth.”

Rose said the best solution might be to not mark the waste sites at all.

“Imagine the seals on King Tut’s tomb,” Rose said. “Every single thing that was marked on the tomb are the same warnings we’re talking about with nuclear waste storage: markings that say you will get sick and that there will be a curse upon your family for generations. Those warnings virtually guaranteed that the tomb would be opened if found.”

The unbroken seal on King Tutankhamun’s tomb

“What if you didn’t mark the waste, and instead put it in a well engineered, hard to get to place that no one would go to unless they thought there was something there. The only reason they’d know something was there was if the storage was marked.”

Considering the relatively low number of casualties that could come from encountering nuclear waste in the far future, Rose suggests that likely the best way to reduce risk is avoid attention.

VI. A Perceived Abundance of Energy

San Onofre’s nuclear waste will sit in a newly-developed Umax dry-cask storage container system made of the most corrosion-resistant grade of stainless steel. It is, according to regulators, earthquake-ready.

At San Onofre, wood squares mark the spots where containers of spent fuel will be encased in concrete / via Jeff Gritchen, OC Register

Environmentalists are nonetheless concerned that the storage containers could crack, given the salty and moist environment of the beach. Others fear that an earthquake coupled with a tsunami cause a Fukushima-like meltdown on the West Coast.

“Dry cask storage is a proven technology that has been used for more than three decades in the United States, subject to review and licensing by the U.S. Nuclear Regulatory Commission,” said a spokeswoman for Edison, the company that runs San Onofre, in an interview with the San Diego Union Tribune.

A lawsuit is pending in the San Diego Supreme Court that challenges the California Coastal Commission’s 02015 permit for the site. A hearing is scheduled for March 02017. If the lawsuit is successful, the nuclear waste in San Onofre might have to move elsewhere sooner than anybody thought.

Meanwhile, the U.S. Department of Energy in January 02017 started efforts to move nuclear waste to temporary storage sites in New Mexico and West Texas that could store the waste until a more long-term solution is devised. Donald Trump’s new Secretary of Energy, former Texas governor Rick Perry, is keen to see waste move to West Texas. Residents of the town of Andrews are split. Some see it as a boon for jobs. Others, as a surefire way to die on the job.

Regardless of how Andrews’ residents feel, San Onofre’s waste could soon be on the way.

Tom Palmisano, Chief Nuclear Officer for Edison, the company that runs San Onofre, expressed doubts and frustration in an interview with the Orange County Register:

There could be a plan, and a place, for this waste within the next 10 years, Palmisano said – but that would require congressional action, which in turn would likely require much prodding from the public.

“We are frustrated and, frankly, outraged by the federal government’s failure to perform,” he said. “I have fuel I can ship today, and throughout the next 15 years. Give me a ZIP code and I’ll get it there.”

A prodding public might be in short supply. According to the latest Gallup poll, support for nuclear power in the United States has dipped to a fifteen-year low.  For the first time since Gallup began asking the question in 01994, a majority of Americans (54%) oppose nuclear as an alternative energy source.

Support for nuclear energy in the United States / via Gallup

Gallup suggests the decline in support is prompted less by fears about safety after incidents like the 02011 Fukushima nuclear plant meltdown, and more by  “energy prices and the perceived abundance of energy sources.” Gallup found that Americans historically only perceive a looming energy shortage when gas prices are high. Lower gas prices at the pump over the last few years have Americans feeling less worried about the nation’s energy situation than ever before.

Taking a longer view, the oil reserves fueling low gas prices will continue to dwindle. With the risks of climate change imminent, many in the nuclear industry argue that nuclear power would radically reduce CO2 levels and provide a cleaner, more efficient form of energy.

But if a widespread embrace of nuclear technology comes to pass, it will require more than a change in sentiment in the U.S. public about its energy future. It will require people embracing the long-term nature of dealing with nuclear waste, and ultimately, to trust future generations to continue to solve these issues.


LongNowA Brief Economic History of Time

“The age of exploration and the industrial revolution completely changed the way people measure time, understand time, and feel and talk about time,” writes Derek Thompson of The Atlantic. “This made people more productive, but did it make them any happier?”

In a wide-ranging essay touching upon the advent of the wristwatch, railroads, and Daylight Saving Time, Thompson reveals how the short-term time frames in our day-to-day experience that are so familiar to us — concepts like the work day, happy hour, the weekend, and retirement—were inventions of the last 150 years of economic change:

Three forces contributed to the modern invention of time. First, the conquest of foreign territories across the ocean required precise navigation with accurate timepieces. Second, the invention of the railroad required the standardization of time across countries, replacing the local system of keeping time using shadows and sundials. Third, the industrial economy necessitated new labor laws, which changed the way people think about work.

“So much of what we now call time,” concludes Thompson, “is a collective myth.” This collective myth helped power the industrial revolution and make our modern world. But, as Stewart Brand wrote at the founding of the Long Now Foundation, it has also contributed to civilization “revving itself into a pathologically short attention span”:

The trend might be coming from the acceleration of technology, the short-horizon perspective of market-driven economics, the next-election perspective of democracies, or the distractions of personal multi-tasking. All are on the increase. Some sort of balancing corrective to the short-sightedness is needed-some mechanism or myth which encourages the long view and the taking of long-term responsibility, where ‘long-term’ is measured at least in centuries. Long Now proposes both a mechanism and a myth.

You can read Thompson’s essay in its entirety here.

TEDMeet the Spring 2017 class of TED Residents

TED Residents Steve Rosenbaum and Nikki Allen Webber break the ice (that’s Alison Cornyn in the background). [Photo: Dian Lofton / TED]

On March 6, TED welcomed its latest class to the TED Residency program. As an in-house incubator for breakthrough ideas, Residents spend four months in the TED office with other exceptional people from all over the map. Each has a project that promises to make a significant contribution to the world, across several different fields.

The new Residents include:

  • A technologist working on app to promote world peace
  • An entrepreneur whose packaging business wants to break America’s addiction to plastic
  • A documentarian profiling young people of color grappling with mental-health challenges
  • A journalist telling the stories of families and friends affected by deportation
  • A programmer who wants to teach kids how to code … without computers
  • A writer-photographer chronicling the lives of Chinese takeout workers in New York City
  • A scientist studying an easier path to deeper sleep

TED Resident alumnus Cavaughn Noel and new TED Resident Evita Turquoise Robinson come together for the TEDStart event. [Photo: Dian Lofton / TED]

At the end of the program, Residents have the opportunity to give a TED Talk about their work and ideas in the theater at TED HQ. Read more about each Resident below:

The daughter of Syrian immigrants , Maytha Alhassen just received her Ph.D. in American Studies & Ethnicity from USC. She was co-editor of Demanding Dignity: Young Voices From the Front Lines of the Arab Revolutions, and her current work focuses on dignity’s central role in liberation movements.

Farhad Attaie wants to sound an alarm: child health is on the decline for the first time in generations. He is a co-founder of hellosmile, a community-focused healthcare startup that promotes preventative care for children.

Carlos Augusto Bautista Isaza is a Colombian creative technologist and interactive engineer whose work focuses on improving information access. He is currently developing MineSafe, a crowdsourced repository of safe walking paths for areas affected by landmines.

Jackson Bird is a video creator and activist. Since publicly coming out as transgender on YouTube, he has been using digital media to amplify transgender voices and promote accurate, respectful representation of transgender people.

New York–based designer Wendy Brawer is the creator of the Green Map, a tool that uses distinctive iconography to denote green-living, natural, social, and cultural resources. Locally led in 65 countries, will soon relaunch with a new, open approach to inspire greater action on climate health and environmental justice among residents and travelers alike.

Formerly director of the MediaLab at the Metropolitan Museum of Art, Marco Antonio Castro Cosío is a designer and technologist. His current project is Bus Roots, a program that puts gardens on the roofs of city buses—to capture rainwater and add green space while also providing a virtual-reality learning experience inside.

Award-winning artist Alison Cornyn is using photography and historical documents to create “Incorrigibles,” an installation and web platform that investigates the incarceration of young women in the US. She also teaches at the School of Visual Arts Design for Social Innovation MA program.

Daniel Gartenberg is a sleep scientist who is testing a new way to detect and improve sleep quality using wearable devices. He is validating his invention in collaboration with Penn State, the National Science Foundation and the National Institute of Aging.

Journalist Duarte Geraldino is documenting the stories of US citizens who’ve lost friends and family to deportation. In the process, he is creating a national archive and shared resource to gauge the impact of the evolving US immigration policy.

TED Resident Bayete Ross Smith, a multimedia artist, meets fellow Resident Fred Kahl, also known at the Coney Island sideshow as sword swallower the Great Fredini. [Dian Lofton / TED]

Anurag Gupta is the founder and CEO of Be More, a social enterprise that employs proven training programs to eradicate unconscious bias. He is also an attorney and a mindfulness expert.

In her two decades of interviewing women and girls, filmmaker and journalist Sue Jaye Johnson documented dozens of stories about the isolation of shame. Now she is focusing on healthy, real-life sex stories from across generations, cultures and orientations—to expand the idea of what’s normal, what’s possible, and who we are as sexual beings.

Fred Kahl, a.k.a. the Great Fredini, is an artist, designer, magician, sword swallower and inventor who uses technology, imagination and play to create surreal, magical experiences. His project is a virtual reality recreation of Coney Island’s famous Luna Park—a turn-of-the-20th-century attraction that showcased fantasy architecture and technological futurism.

Anindya Kundu is a sociologist who studies the qualities that enable disadvantaged students to succeed, despite personal, social and institutional challenges. His book, Achieving Agency, is forthcoming.

A native of Finland, Linda Liukas is the creator of Hello Ruby, a children’s book that teaches programming skills without a computer. A programmer herself, Linda wants to make computers big again—so big that a child can crawl inside and learn how it works from the inside out.

TED Resident Paul Tasner, a first-time entrepreneur at 71, gets to know creative technologist Carlos Bautista, also a new Resident. [Photo: Dian Lofton / TED]

Beth Malone is an artist, curator and social entrepreneur who’s exploring ways art can reimagine (and improve) environment and circumstance. In her case, that includes coping with her father’s dementia, and helping others understand what it may mean if they are confronted with the disease.

Leslie Martinez is a designer and researcher who works at the intersection of immigration and design. She recently organized Hack the Ban, a hackathon that matched creatives and technologists with organizations supporting the Muslim and immigrant communities of New York City.

Matthew Nolan is a social entrepreneur and technologist who founded and developed Verona, an app that promotes world peace by introducing users to others with opposing views.

Evita Turquoise Robinson is the creator of the Nomadness Travel Tribe—a celebration of cultural harmony and curiosity spread by some 15,000 likeminded travelers of color. She is now considering what might happen if black communities use their buying power to invest in international property, while retaining a sense of social and economic consciousness.

Steven Rosenbaum is a digital entrepreneur, filmmaker, author and journalist (7 Days In September, MTV Unfiltered, Curation Nation). He sees the “fake news” crisis as an invitation to rethink the presentation of news, and help consumers tell the difference between fact and opinion.

Bayeté Ross Smith is a multimedia artist who is on the faculty at NYU and the International Center of Photography. His new project examines the mindset and traditions of people of color who serve in combat for the US military, despite knowing that they will continue to face oppression when they return home.

Writer-photographer Katie Salisbury is working on a multimedia project that will tell the stories of Chinese takeout workers in New York City and also examine the significance of Chinese food in American culture.

Disinterested in retirement, Paul Tasner instead launched PulpWorks, Inc., a packaging company that uses only waste paper, agricultural byproducts and textiles as raw materials. His goal is to reduce the billions of pounds of plastic packaging that enter our oceans, waterways and landfills each year.

After losing her nephew to suicide, Emmy-winning TV producer Nikki Webber Allen made it her mission to spark awareness of mental health issues in the African-American community. She believes that the cultural stigma of mental illness in communities of color keeps many people from seeking help, and she is working on a documentary to tell the candid stories of young people dealing with these disorders.


Applications for the Fall 2017 class (which runs September 11 to December 15) open on April 15 at

Planet DebianRaphaël Hertzog: Freexian’s report about Debian Long Term Support, February 2017

A Debian LTS logoLike each month, here comes a report about the work of paid contributors to Debian LTS.

Individual reports

In January, about 154 work hours have been dispatched among 13 paid contributors. Their reports are available:

  • Antoine Beaupré did 3 hours (out of 13h allocated, thus keeping 10 extra hours for March).
  • Balint Reczey did 13 hours (out of 13 hours allocated + 1.25 hours remaining, thus keeping 1.25 hours for March).
  • Ben Hutchings did 19 hours (out of 13 hours allocated + 15.25 hours remaining, he gave back the remaining hours to the pool).
  • Chris Lamb did 13 hours.
  • Emilio Pozuelo Monfort did 12.5 hours (out of 13 hours allocated, thus keeping 0.5 hour for March).
  • Guido Günther did 8 hours.
  • Hugo Lefeuvre did nothing and gave back his 13 hours to the pool.
  • Jonas Meurer did 14.75 hours (out of 5 hours allocated + 9.75 hours remaining).
  • Markus Koschany did 13 hours.
  • Ola Lundqvist did 4 hours (out of 13h allocated, thus keeping 9 hours for March).
  • Raphaël Hertzog did 3.75 hours (out of 10 hours allocated, thus keeping 6.25 hours for March).
  • Roberto C. Sanchez did 5.5 hours (out of 13 hours allocated + 0.25 hours remaining, thus keeping 7.75 hours for March).
  • Thorsten Alteholz did 13 hours.

Evolution of the situation

The number of sponsored hours increased slightly thanks to Bearstech and LiHAS joining us.

The security tracker currently lists 45 packages with a known CVE and the dla-needed.txt file 39. The number of open issues continued its slight increase, this time it could be explained by the fact that many contributors did not spend all the hours allocated (for various reasons). There’s nothing worrisome at this point.

Thanks to our sponsors

New sponsors are in bold.

No comment | Liked this article? Click here. | My blog is Flattr-enabled.

Cory DoctorowFair trade ebooks: how authors could double their royalties without costing their publishers a cent

My latest Publishers Weekly column announces the launch-date for my long-planned “Shut Up and Take My Money” ebook platform, which allows traditionally published authors to serve as retailers for their publishers, selling their ebooks direct to their fans and pocketing the 30% that Amazon would usually take, as well as the 25% the publisher gives back to them later in royalties.

I’ll be launching the platform with my next novel, Walkaway, in late April, and gradually rolling out additional features, including a name-your-price system inspired by the Humble Bundle and the Ubuntu payment screen.

Selling your own ebooks means that you can have more than one publisher — say, a UK and a US one — and sell on behalf of both of them, meaning that readers anywhere in the world come to one site to buy their books, and the author takes care of figuring out which publisher gets the payment from that purchase.

It’s all an idea whose time has come! My UK publisher, Head of Zeus, is just launched a very similar initiative for authors who don’t want to host their own stores: BookGrail.

Buying an e-book from a website and sideloading it onto your Kindle will never be as easy as buying it from the Kindle store (though if the world’s governments would take the eminently sensible step of legalizing jailbreaking, someone could develop a product that let Kindles easily access third-party stores on the obvious grounds that if you buy a Kindle, you still have the right to decide whose books you’ll read on it, otherwise you don’t really own that Kindle). But a bookstore operated by an author has an advantage no giant tech platform can offer: a chance to buy your e-books in a way that directly, manifestly benefits the author.

As an author, being my own e-book retailer gets me a lot. It gets me money: once I take the normal 30 percent retail share off the top, and the customary 25 percent royalty from my publisher on the back-end, my royalty is effectively doubled. It gives me a simple, fair way to cut all the other parts of the value-chain in on my success: because this is a regular retail sale, my publishers get their regular share, likewise my agents. And, it gets me up-to-the-second data about who’s buying my books and where.

It also gets me a new audience that no retailer or publisher is targeting: the English-speaking reader outside of the Anglosphere. Travel in Schengen, for example, and you will quickly learn that there are tens of millions of people who speak English as a second (or third, or fourth) language, and nevertheless speak it better than you ever will. Yet there is no reliable way for these English-preferring readers, who value the writer’s original words, unfiltered by translation, to source legal e-books in English.

Amazon and its competitors typically refuse outright to deal with these customers, unable to determine which publisher has the right to sell to them. Most publishing contracts declare these nominally non-English-speaking places to be “open territory” where in theory all of the book’s publishers may compete, but in practice, none of them do.

London Book Fair 2017: Cory Doctorow Unveils His Latest Publishing Experiment—Fair Trade E-Books

[Cory Doctorow/Publishers Weekly]

Planet DebianEnrico Zini: Django signing signs, does not encrypt

As is says in the documentation. django.core.signing signs, and does not encyrpt.

Even though signing.dumps creates obscure-looking tokens, they are not encrypted, and here's a proof:

>>> from django.core import signing
>>> a = signing.dumps({"action":"set-password", "username": "enrico", "password": "SECRET"})
>>> from django.utils.encoding import force_bytes
>>> print(signing.b64_decode(force_bytes(a.split(":",1)[0])))

I'm writing it down so one day I won't be tempted to think otherwise.

CryptogramUsing Intel's SGX to Attack Itself

Researchers have demonstrated using Intel's Software Guard Extensions to hide malware and steal cryptographic keys from inside SGX's protected enclave:

Malware Guard Extension: Using SGX to Conceal Cache Attacks

Abstract:In modern computer systems, user processes are isolated from each other by the operating system and the hardware. Additionally, in a cloud scenario it is crucial that the hypervisor isolates tenants from other tenants that are co-located on the same physical machine. However, the hypervisor does not protect tenants against the cloud provider and thus the supplied operating system and hardware. Intel SGX provides a mechanism that addresses this scenario. It aims at protecting user-level software from attacks from other processes, the operating system, and even physical attackers.

In this paper, we demonstrate fine-grained software-based side-channel attacks from a malicious SGX enclave targeting co-located enclaves. Our attack is the first malware running on real SGX hardware, abusing SGX protection features to conceal itself. Furthermore, we demonstrate our attack both in a native environment and across multiple Docker containers. We perform a Prime+Probe cache side-channel attack on a co-located SGX enclave running an up-to-date RSA implementation that uses a constant-time multiplication primitive. The attack works although in SGX enclaves there are no timers, no large pages, no physical addresses, and no shared memory. In a semi-synchronous attack, we extract 96% of an RSA private key from a single trace. We extract the full RSA private key in an automated attack from 11 traces within 5 minutes.

News article.

Worse Than FailureSoftware on the Rocks: Episode 4: Anarchy for Sale

Thanks to a combination of illnesses, travel, timezones, and the other horrors of the modern world, we took a week off. If Angular can skip version 3, we can skip episode 3. Welcome to Episode 4 of Software on the Rocks, brought to you by Atalasoft.

In today’s episode, we are joined by TDWTF author Jane Bailey. We talk about the process of writing, the nature of programming, and “programmer anarchy”.

This episode of Software on the Rocks is brought to you by Atalasoft.


Web Player

We’ll be back in two weeks, when Alex and Remy finally have a duel to the death. Two programmers enter, only one programmer leaves. Follow future episodes here on the site, or subscribe to our podcast.


Alex: So, I guess this is episode number three now, if I’m keeping track, right?

Remy: So, that makes three episodes that you’ve started by saying, “So, I guess this is…”

Alex: Oh, look. We’ve got – I don’t know. We’re not doing these right after another. Podcasting is pretty new to me, and I don’t know. I think this is a lot of fun. I’m enjoying it. And from what I can tell from reading the comments, the readers are absolutely enjoying it, too. Remy, I want to read this one to you. It’s from Ross, and then in parentheses, “unregistered.” So, let me read you his comment, because it, I really think, speaks to the heart of what we’re doing here. [Clears throat] Okay. “Not that anybody at The Daily WTF will care, but I, too, have no f---ing interest in The Daily WTF podcast, or any other podcast whatsoever. I won’t listen. I won’t read the transcripts. I’m hoping that you continue putting ‘Software on the Rocks’ in the title so I can get my RSS feed to ignore them.” Eh? That’s exciting, yeah?

Jane: Sounds about right. Yeah, that’s what we hope to hear. [ Chuckles ]

Alex: Here’s another commenter that says, “This is just terrible and an absolute waste of time.” They won’t come back, and it’s sad that they won’t hear me thanking them for sharing all of this great feedback to them, but, nonetheless, I’m happy that they listened to at least our first two episodes.

Remy: Hello, everyone. Let’s actually do our introductions. I’m Remy with Alex Papadimoulis, and we also have, today, Jane, who you may know as Bailey from The Daily WTF. She’s one of the writers, contributes a lot of great articles, so I wanted to introduce her and give her a chance to say hello.

Jane: Remy, I am always having fun, but now more so than ever.

Remy: Bailey actually joined the site – It’s a couple of years ago now – shortly after I took on as being chief editor. But as part of me single-handedly ruining the site, I brought in a bunch of people who hadn’t written for the site before and said, “Start writing for the site,” and Jane was one of those folks.

Alex: I think it’s great. The commenters, again – They also seem to just hate it. But they’ve f---ing hated everything ever since day two of the website; so still, we’re on that continued decline from our height. I didn’t really think that Daily WTF would ever evolve into this sort of – I guess you could call it – what? – I.T. fiction? Or I guess you could call it I.T. based on true stories. But it’s really interesting to read, and it’s a fun blend between sort of our day jobs and fiction that captivates and tells stories.

Remy: I don’t know about you, Jane. I have a very specific process I use for turning a submission into a story. Our readers often accuse us of fictionalizing too much or anonymizing too hard. I don’t know about you personally. I actually don’t do very much of that at all. Personally, if I’m fictionalizing, it’s because the submission itself didn’t contain a lot of details, and the details I add are almost always from, like, real-world experiences I’ve had.

Jane: Yeah, I absolutely agree. Sometimes they’re details of coworkers I’ve spoken with, telling me their horror stories, or other things that I’ve seen around The Daily WTF forums, but more often than not, they’re just things that I have seen, you know?

Alex: You know, certainly a lot of folks have written in or have commented and said, “Oh, this is clearly a fabricated story,” or “clearly, you’ve made up a lot of the details.” But, you know, what I think is more accurate, or certainly worth considering, at least, is the folks telling these stories are the ones who have forgotten almost all of the details. Every time you retell that story to your friends, to your coworkers, to whomever, the story gets more and more muddled. So, the reality of what actually happened is really vague to begin with, and if we were to break down those – the bits of the story that they actually share, it would be really boring and certainly no one would be reading it. One of the best bits of feedback that I’ve ever received was from a guy that I met who had sent in a story. He said that “You changed every detail in the story for the most part; however, after I read it, it felt more like that’s exactly what happened than in my own mind.” And that, to me, was a good way to sort of recreate an event in a fictional sense.

Remy: I kind of always think of it more like, back in the ‘50s and ‘60s, you had the “true crime” writers, right? And they would take real-world news stories – you know, murders and gang killings – and they would turn them into these novels that had a lot of fictionalized details. They certainly played up the more salacious elements of the whole thing. But their goal there was to take something that was based in reality, decorate into something that was also entertaining.

Jane: I think some of my favorite submissions are ones where the submitter has clearly not understood what was the most interesting part of the story or possibly is completely in the wrong about what was happening and should have happened to where I can really punch up someone else’s viewpoint and tell a much better story than they submitted just by virtue of, you know, having – being an objective observer and saying, “Well, what you submitted was okay, but this part – this part’s great.”

Alex: Do you guys happen to remember the ITAPPMONROBOT story about the –

Jane: Oh, yeah. Of course I do.

Remy: Oh, yeah.

Alex: It’s one of my absolute favorites, but if you go back to the original submission, and I don’t know if we’ve ever linked it. I know we shared it with the writers, obviously, as well, but, you know, the original submission, if memory serves correct, had to do with an anti-Windows – It was a “Windows is terrible, Linux is great” story. And to the submitter, that was the notable part is that Windows was so bad that he couldn’t reset the server remotely, so he programmed a Linux box to eject a CD drive. Mm, that’s kind of lame. Just – Who cares? Another “Oh, Windows is terrible” story. But we turned it into a much more memorable story. Same exact set of facts packaged and presented completely differently, and it went from being a rant that would be in the comments section somewhere to a pretty memorable story that is one of my favorites and, I think, a lot of readers’ favorites, as well.

Remy: On Jane’s point, there have been a few times where I’ve picked up a submission where the real WTF in this submission was the submitter.

Jane: Yeah.

Remy: You can read their story, and they’re trying to make this argument about how horrible everyone around them is, but you can see clearly they were the ones who were actually in the wrong. One of my personal rules is that, even if I read the submission and the submitter is the real WTF, I don’t write the story from that perspective. The submitter is always going to be the hero of the story, but maybe we move the actual problem or we express the problem in a slightly different way. There’s actually one that – It hasn’t run yet, but it’s going to run before this episode airs, and it falls kind of into that category. Ellis wrote a story called “The Automation Vigilante,” which – It’s actually about the person that, as developers, we tend to have a big problem with. They’re the non-developer who writes a bunch of macros to automate their job.

Alex: I love that title: “The Programmer Vigilante.” Just the image that that’s conjuring up and – Basically, it describes so many people that I’ve met in my career, as well. On the idea – This concept – So, programmer vigilante – It reminds me that one topic that was sort of on our shortlist to bring up today was something that – It wasn’t programmer vigilante. It was what? It was programmer anarchy. Programming anarchy.

Jane: So, I heard about programmer anarchy when I was looking up, you know, different types of agile and some out-there agile concepts. And basically what it is, as an elevator pitch, is a bunch of developers who fired all the managers, fired all the business analysts, fired all the Q.A. people, sat down with the stakeholders, and said, “We’re in charge now.”

Alex: Oh! That sounds about as successful as real-world anarchy.

Remy: This sounds like the textbook definition of the Dunning-Kruger effect.

Jane: Pretty much.

Alex: So, it’s an interesting concept, and I guess if no actual work needs to be done – Again, let’s say like real-world anarchy – I could see it working. But what is the context that this came from? Is this a methodology that’s espoused by, oh, I don’t know, somebody just who’s been completely embroiled in this Dunning-Kruger effect and thinks this is a good way to run a business?

Jane: Well, and that’s the fascinating part is that it worked really well for them. But the thing is, they started with a bunch of senior developers who really knew the business, inside and out, had been right there from the beginning, with the stakeholders. They worked in an industry that was very fast-paced, to where the biggest risk was not moving. So as long as you did something, even if it failed, you would come away with better payouts than doing nothing.

Alex: So, I guess it’s this idea that it turns out that, yes, they said they fired all the business people, they fired all the other folks, but they were those people. They had knowledge – very specific knowledge of whatever specific industry they were in, which completely invalidates the entire thing. That’s like saying, “Well, anarchy will work as long as it’s these 12 people on this island who will run it like a utopia.”

Jane: Yeah. I mean, that’s basically it. If you have a bunch of high-performing senior developers who know the business like the back of their hand, you could do anything and you would come away with a success.

Alex: Now, what I see we’re doing here, Jane, is that we are building up this wonderful effigy made of straw and then setting it ablaze. So, Remy, I’m hoping that you can come in and defend programmer anarchy.

Remy: I can, actually, Alex. And feel free to stop me, Jane, if I’m kind of going off the path, but I’m almost agreeing, actually, with Jane, even as I say I’m gonna defend programmer anarchy, because here’s a few things to keep in mind. First off, if you are engaged in any creative effort – and programming is a creative effort. I think we can agree on that much. It’s a technical field, but it’s very much a creative effort. If you are engaged in a creative effort, handing the reins to people who are passionate about what’s being created – now, not passionate for the tools, right? You don’t hire a sculptor because they love a chisel. You hire a sculptor because they are passionate about sculpting, right? So, people who are passionate about what they’re making, people who know the business, in this case, right, who are intimate with the business – Jane’s absolute right. You get those people involved, no matter what you do, you’re gonna see at least some degree of success. But here’s the other thing. In a lot of organizations, there is a big pile of management overhead, and I have actually said this many times. If you want to make an organization perform better, fire all the middle managers.

Alex: So, that’s a couple interesting points. Now, first, I don’t think I can let you get away with the
“programming is creative.” You know what’s creative? Music is creative. Sculpting is creative. Um, programming is engineering – solving a very specific problem to achieve a very specific business goal. Obviously, the mechanism of that problem is not well-defined, but that does not mean that it’s a creative endeavor.

Jane: See, I also have to disagree a little bit, Alex. I think you have to separate the act of programming from the more creative endeavor of software engineering. I think there’s a lot of different particular disciplines within, you know, computer science and within the art of making a program, some of which are more creative and some of which are purely mechanical.

Alex: Well, and that’s certainly fair, because software eventually does interact with users, but that’s a whole field of discipline called user experience, which is quite a bit different than, say, “Let’s figure out how to write a process map that takes this loan and runs it through a dozen different underwriting programs.”

Remy: If you are presented with a problem space – Let’s get programming out of here entirely. We’re building a factory. We need to take raw materials through a factory and get widgets on the other side. This is a broad problem space. There is a nearly infinite number of ways that we could accomplish that goal. There are definitely ways that are better than others. There are definitely ways that are almost equivalent but have certain tradeoffs that make them better in certain cases than others. And I would argue that the act of taking a huge problem space and cutting that problem space down to the appropriate solution is an act of creativity.

Alex: Ultimately, what we’re looking at is software that’s designed to implement a business process. A lot of times, software defines that business process, but when we’re saying that developing this code behind a pre-defined – And I’ll give you that there’s a creativity in defining that business process, but in implementing and in software, where’s the creativity outlet?

Jane: Being that, you know, I have my fingers in a lot pies – I’m sort of a modern Renaissance woman – one of the many things I am is a writer, and so, inevitably, my mind goes to try and liken software design to writing a novel – particularly a novel. A short story’s a little easier. But nobody can doubt that coming up with the idea for a novel and, you know, the setting and the plot is purely creative, but at the same time, typing the words on the keyboard is a purely mechanical task, to me, akin to writing lines of software code. There’s only one right way to spell each word, and there’s only a handful of right ways to link them together into sentences.

Remy: Jane, I really love that metaphor, because as you were saying it, it dawned on me – If you are in the process of writing a novel, one of the decisions that is both a mechanical decision and a creative decision is, in the space of an individual sentence, am I going to describe what I’m telling the reader about? I have a character who’s wearing a coat. Do I describe the coat in a literal description or do I provide a more metaphorical description or a simile that conjures a different image than the strict literal description? And that’s something that, if we were to think about this from a programming perspective, that’s a question about what abstraction tools we use to express an idea. So, here’s an example where there is that element of creative thought in programming where I could write a very literal program that uses almost no abstraction, right? I mean, go into your operating system, right? You go into kernel code, and kernel code is extremely literal. It expresses things in a very concrete fashion. For a more high-level language, we are gonna be using these more abstract tools, and that’s where a certain degree of expressiveness comes in, where there are a lot of ways to say the idea you want to say.

Jane: To do it well, at least.

Alex: Well, and I think, though, the same case could be made for every single engineering profession.

Remy: I agree.

Alex: Building a bridge, a skyscraper. But, you know, the difference between engineers, creative professionals, and programmers, who seem to want to live in both worlds when convenient for them, is that engineers have things like schedulers.

Remy: This is where I have to disagree with you on creative work so, so much. Because one of the creative endeavors that I have – and I’m strictly a hobbyist, and I am very bad – but I like directing short films. And if you are going out to direct a film, there are a lot of people and there are a lot of resources that you have to manage through that filmmaking process. That requires schedules. It requires budgets. It requires – You know, you have to have a call sheet that tells all of your cast and crew where to be when and what they’re gonna be doing once they get there. This is where you do need a management organization. And I know I’m kind of contradicting myself, ‘cause I just said, “Fire all the middle managers,” but I want to be specific there. Fire all the middle managers.

Alex: I mean, to an extent, but now you’re talking about – And this, I guess, was the point that I was driving towards, right, is that, you know, on the creative side, of course there’s deadlines, but you know what creative people do. They just work straight up until the deadline. If you want to work the eight-hour days, which, by the way, many, many people prefer to do, because this isn’t fun for them. This is work for them. So, for the people who do want to work those eight-hour days, we really do need to build structures, an organizational system of ways to manage the eight hours that each programmer, each developer, each analyst is able to work each day. And, you know, going to this middle manager notion – Of course they’re very, very valuable. Any good manager – what – eight to ten people? That’s about all that you can really manage as a team. What happens when you have 100 developers?

Remy: I think there’s a lot to mine on this topic. And I’d love to continue that, but I think that’s way too big a topic for this. Let’s pull it back to programmer anarchy as an organizational technique. And I want to hear Jane’s, you know, key thoughts and key takeaways about programmer anarchy.

Jane: Yeah, well, you know, to me, the most interesting thing that I found out about programmer anarchy when I was looking into it – The original white paper that they put together, they talked, you know, proudly about all the things they got rid of. They got rid of all the testing. They got rid of iterations. They got rid of user stories. They went to, like, a microservice architecture where, you know, they felt like they would never have to refactor code again because they can just delete it all and start over because each service was so small. They got rid of paired programming. And two years later, one of the two authors of the paper sat and, you know, went through, to him, how successful it was. And he says, “Well, if you look around the company today, everyone’s doing stand-ups. A lot of people ended up liking the idea of refactoring code because it was pleasant to them to see their code get progressively better over time. And they’ve started doing paired programming again because they miss the opportunity to learn from another developer.” So ultimately, what they discovered was that self-organizing teams are particularly effective when you have senior developers on them and, also, a lot of the things that were invented were invented for a reason.

Alex: That’s something that’s always fascinated me about organizations. We look at them, we mock them, but the reality is it’s kind of difficult to take billions of dollars and turn it into slightly more than billions of dollars. Programmer anarchy really speaks to this notion that a small group of self-organizing folks – I’d almost look at it as this notion of a strike team –you know, in military terms can just go in. A really good strike team, maybe of ten people, can clear an entire skyscraper, right, floor by floor. It’ll take a good one to do it, the most senior, most agile, ninja-level skills to do that, but there’s no way that even the best strike team in the world can take a block or a city, for that matter. That’s where you need an army. And I think that’s what we’re looking at here with programmer anarchy is it works just fine when you’ve got a strike team, but they’re not gonna be able to build a giant system. It’s just not possible because it’s too big for them.

Remy: I think that is an excellent point to wrap up this episode on. I think we’ve got some grist for the mill for some future episodes. Jane, I want to thank you for hanging out with us today on this podcast. I communicate with the writers mostly via e-mail, so it’s kind of a nice treat to actually get to hear one of the voices on the other side of that e-mail chain.

Jane: It was my pleasure.

Alex: Definitely, and hopefully you can join us for one of the Radio WTF things that I’m sure Lorne is working on and will have for us soonish.

Remy: Yeah, maybe the person who’s the editor of the site should contact Lorne and make sure that that’s going into progress. That’s, uh…

Alex: Remy, this is how you’ve ruined the site.

Anarchy for Sale.
[Advertisement] Atalasoft’s imaging SDKs come with APIs & pre-built controls for web viewing, browser scanning, annotating, & OCR/barcode capture. Try it for 30 days with included support.

Planet DebianWouter Verhelst: Codes of Conduct

These days, most large FLOSS communities have a "Code of Conduct"; a document that outlines the acceptable (and possibly not acceptable) behaviour that contributors to the community should or should not exhibit. By writing such a document, a community can arm itself more strongly in the fight against trolls, harassment, and other forms of antisocial behaviour that is rampant on the anonymous medium that the Internet still is.

Writing a good code of conduct is no easy matter, however. I should know -- I've been involved in such a process twice; once for Debian, and once for FOSDEM. While I was the primary author for the Debian code of conduct, the same is not true for the FOSDEM one; I was involved, and I did comment on a few early drafts, but the core of FOSDEM's current code was written by another author. I had wanted to write a draft myself, but then this one arrived and I didn't feel like I could improve it, so it remained.

While it's not easy to come up with a Code of Conduct, there (luckily) are others who walked this path before you. On the "geek feminism" wiki, there is an interesting overview of existing Open Source community and conference codes of conduct, and reading one or more of them can provide one with some inspiration as to things to put in one's own code of conduct. That wiki page also contains a paragraph "Effective codes of conduct", which says (amongst others) that a good code of conduct should include

Specific descriptions of common but unacceptable behaviour (sexist jokes, etc.)

The attentive reader will notice that such specific descriptions are noticeably absent from both the Debian and the FOSDEM codes of conduct. This is not because I hadn't seen the above recommendation (I had); it is because I disagree with it. I do not believe that adding a list of "don't"s to a code of conduct is a net positive to it.

Why, I hear you ask? Surely having a list of things that are not welcome behaviour is a good thing, which should be encouraged? Surely such a list clarifies the kind of things your does not want to see? Having such a list will discourage that bad behaviour, right?

Well, no, I don't think so. And here's why.

Enumerating badness

A list of things not to do is like a virus scanner. For those not familiar with these: on some operating systems, there is specific piece of software that everyone recommends you run, which checks if particular blobs of data appear in files on the disk. If they do, then these files are assumed to be bad, and are kicked out. If they do not, then these files are assumed to be not bad, and are left alone (for the most part).

This works if we know all the possible types of badness; but as soon as someone invents a new form of badness, suddenly your virus scanner is ineffective. Additionally, it also means you're bound to continually have to update your virus scanner (or, as the case may be, code of conduct) to a continually changing hostile world. For these (and other) reasons, enumerating badness is listed as number 2 in security expert Markus Ranum's "six dumbest ideas in computer security," which was written in 2005.

In short, a list of "things not to do" is bound to be incomplete; if the goal is to clarify the kind of behaviour that is not welcome in your community, it is usually much better to explain the behaviour that is wanted, so that people can infer (by their absense) the kind of behaviour that isn't welcome.

This neatly brings me to my next point...

Black vs White vs Gray.

The world isn't black-and-white. We could define a list of welcome behaviour -- let's call that the whitelist -- or a list of unwelcome behaviour -- the blacklist -- and assume that the work is done after doing so. However, that wouldn't be true. For every item on either the white or black list, there's going to be a number of things that fall somewhere in between. Let's call those things as being on the "gray" list. They're not the kind of outstanding behaviour that we would like to see -- they'd be on the white list if they were -- but they're not really obvious CoC violations, either. You'd prefer it if people don't do those things, but it'd be a stretch to say they're jerks if they do.

Let's clarify that with an example:

Is it a code of conduct violation if you post links to pornography websites on your community's main development mailinglist? What about jokes involving porn stars? Or jokes that denigrate women, or that explicitly involve some gender-specific part of the body? What about an earring joke? Or a remark about a user interacting with your software, where the women are depicted as not understanding things as well as men? Or a remark about users in general, that isn't written in a gender-neutral manner? What about a piece of self-deprecating humor? What about praising someone else for doing something outstanding?

I'm sure most people would agree that the first case in the above paragraph should be a code of conduct violation, whereas the last case should not be. Some of the items in the list in between are clearly on one or the other side of the argument, but for others the jury is out. Let's call those as being in the gray zone. (Note: no, I did not mean to imply that the list is ordered in any way ;-)

If you write a list of things not to do, then by implication (because you didn't mention them), the things in the gray area are okay. This is especially problematic when it comes to things that are borderline blacklisted behaviour (or that should be blacklisted but aren't, because your list is incomplete -- see above). In such a situation, you're dealing with people who are jerks but can argue about it because your definition of jerk didn't cover teir behaviour. Because they're jerks, you can be sure they'll do everything in their power to waste your time about it, rather than improving their behaviour.

In contrast, if you write a list of things that you want people to do, then by implication (because you didn't mention it), the things in the gray area are not okay. If someone slips and does something in that gray area anyway, then that probably means they're doing something borderline not-whitelisted, which would be mildly annoying but doesn't make them jerks. If you point that out to them, they might go "oh, right, didn't think of it that way, sorry, will aspire to be better next time". Additionally, the actual jerks and trolls will have been given less tools to argue about borderline violations (because the border of your code of conduct is far, far away from jerky behaviour), so less time is wasted for those of your community who have to police it (yay!).

In theory, the result of a whitelist is a community of people who aspire to be nice people, rather than a community of people who simply aspire to be "not jerks". I know which kind of community I prefer.

Giving the wrong impression

During one of the BOFs that were held while I was drafting the Debian code of conduct, it was pointed out to me that a list of things not to do may give the impression to people that all these things on this list do actually happen in the code's community. If that is true, then a very long list may produce the impression that the given community is a community with a lot of problems.

Instead, a whitelist-based code of conduct will provide the impression that you're dealing with a healthy community. Whether that is the case obviously depends on more factors than just the code of conduct itself, but it will put people in the right mindset for this to become something of a self-fulfilling prophecy.


Given all of the above, I think a whitelist-based code of conduct is a better idea than a blacklist-based one. Additionally, in the few years since the Debian code of conduct was accepted, it is my impression that the general atmosphere in the Debian project has improved, which would seem to confirm that the method works (but YMMV, of course).

At any rate, I'm not saying that blacklist-based codes of conduct are useless. However, I do think that whitelist-based ones are better; and hopefully, you now agree, too ;-)

Planet DebianBen Hutchings: Debian LTS work, February 2017

I was assigned 13 hours of work by Freexian's Debian LTS initiative and carried over 15.25 from January. I worked 19 hours and have returned the remaining 9.25 hours to the general pool.

I prepared a security update for the Linux kernel and issued DLA-833-1. However, I spent most of my time catching up with a backlog of fixes for the Linux 3.2 longterm stable branch. I issued two stable updates (3.2.85, 3.2.86).

Krebs on SecurityFour Men Charged With Hacking 500M Yahoo Accounts

“Between two evils, I always pick the one I never tried before.” -Karim Baratov (paraphrasing Mae West)

The U.S. Justice Department today unsealed indictments against four men accused of hacking into a half-billion Yahoo email accounts. Two of the men named in the indictments worked for a unit of the Russian Federal Security Service (FSB) that serves as the FBI’s point of contact in Moscow on cybercrime cases. Here’s a look at the accused, starting with a 22-year-old who apparently did not try to hide his tracks.

According to a press release put out by the Justice Department, among those indicted was Karim Baratov (a.k.a. Kay, Karim Taloverov), a Canadian and Kazakh national who lives in Canada. Baratov is accused of being hired by the two FSB officer defendants in this case — Dmitry Dokuchaev, 33, and Igor Sushchin, 43 — to hack into the email accounts of thousands of individuals.

Karim Baratov, as pictured in 2014 on his own site,

Karim Baratov (a.k.a. Karim Taloverov), as pictured in 2014 on his own site, The license plate on his BMW pictured here is Mr. Karim.

Reading the Justice Department’s indictment, it would seem that Baratov was perhaps the least deeply involved in this alleged conspiracy. That may turn out to be true, but he also appears to have been the least careful about hiding his activities, leaving quite a long trail of email hacking services that took about 10 minutes of searching online to trace back to him specifically.

Security professionals are fond of saying that any system is only as secure as its weakest link. It would not be at all surprising if Baratov was the weakest link in this conspiracy chain.

A look at Mr. Baratov’s Facebook and Instagram photos indicates he is heavily into high-performance sports cars. His profile picture shows two of his prized cars — a Mercedes and an Aston Martin — parked in the driveway of his single-family home in Ontario.

A simple reverse WHOIS search at on the name Karim Baratov turns up 81 domains registered to someone by this name in Ontario. Many of those domains include the names of big email providers like Google and Yandex, such as accounts-google[dot]net and www-yandex[dot]com.

Other domains appear to be Web sites selling email hacking services. One of those is a domain registered to Baratov’s home address in Ancaster, Ontario called infotech-team[dot]com. A cached copy of that site from shows this once was a service that offered “quality mail hacking to order, without changing the password.” The service charged roughly $60 per password.'s cache of, an email hacking service registered to Baratov.’s cache of, an email hacking service registered to Baratov.

The proprietors of Infotech-team[dot]com advertise the ability to steal email account passwords without actually changing the victim’s password. According to the Justice Department, Baratov’s service relied on “spear phishing” emails that targeted individuals with custom content and enticed the recipient into clicking a link.

Antimail[dot]org is another domain registered to Baratov that was active between 2013 and 2015. It advertises “quality-mail hacking to order!”:


Another email hacking business registered to Baratov is xssmail[dot]com, which also has for several years advertised the ability to break into email accounts of virtually all of the major Webmail providers. XSS is short for “cross-site-scripting.” XSS attacks rely on vulnerabilities in Web sites that don’t properly parse data submitted by visitors in things like search forms or anyplace one might enter data on a Web site.

In the context of phishing links, the user clicks the link and is actually taken to the domain he or she thinks she is visiting (e.g., but the vulnerability allows the attacker to inject malicious code into the page that the victim is visiting.

This can include fake login prompts that send any data the victim submits directly to the attacker. Alternatively, it could allow the attacker to steal “cookies,” text files that many sites place on visitors’ computers to validate whether they have visited the site previously, as well as if they have authenticated to the site already.'s cache of’s cache of

Perhaps instead of or in addition to using XSS attacks in targeted phishing emails, Baratov also knew about or had access to other cookie-stealing exploits collected by another accused in today’s indictments: Russian national Alexsey Alexseyevich Belan.

According to government investigators, Belan has been on the FBI’s Cyber Most Wanted list since 2013 after breaking into and stealing credit card data from a number of e-commerce companies. In June 2013, Belan was arrested in a European country on request from the United States, but the FBI says he was able to escape to Russia before he could be extradited to the U.S.

A screenshot from the FBI's Cyber Most Wanted List for Alexsey Belan.

A screenshot from the FBI’s Cyber Most Wanted List for Alexsey Belan.

The government says the two other Russian nationals who were allegedly part of the conspiracy to hack Yahoo — the aforementioned FSB Officers Dokuchaev and Sushchin — used Belan to gain unauthorized access to Yahoo’s network. Here’s what happened next, according to the indictments:

“In or around November and December 2014, Belan stole a copy of at least a portion of Yahoo’s User Database (UDB), a Yahoo trade secret that contained, among other data, subscriber information including users’ names, recovery email accounts, phone numbers and certain information required to manually create, or ‘mint,’ account authentication web browser ‘cookies’ for more than 500 million Yahoo accounts.

“Belan also obtained unauthorized access on behalf of the FSB conspirators to Yahoo’s Account Management Tool (AMT), which was a proprietary means by which Yahoo made and logged changes to user accounts. Belan, Dokuchaev and Sushchin then used the stolen UDB copy and AMT access to locate Yahoo email accounts of interest and to mint cookies for those accounts, enabling the co-conspirators to access at least 6,500 such accounts without authorization.”

U.S. investigators say Dokuchaev was an FSB officer assigned to Second Division of FSB Center 18, also known as the FSB Center for Information Security. Dokuchaev’s colleague Sushchin was an FSB officer and embedded as a purported employee and Head of Information Security at a Russian financial firm, where he monitored the communications of the firm’s employees.


According to the Justice Department, some victim accounts that Dokuchaev and Sushchin asked Belan and Baratov to hack were of predictable interest to the FSB (a foreign intelligence and law enforcement service), such as personal accounts belonging to Russian journalists; Russian and U.S. government officials; employees of a prominent Russian cybersecurity company; and numerous employees of other providers whose networks the conspirators sought to exploit. Other personal accounts belonged to employees of commercial entities, such as a Russian investment banking firm, a French transportation company, U.S. financial services and private equity firms, a Swiss bitcoin wallet and banking firm and a U.S. airline.

“During the conspiracy, the FSB officers facilitated Belan’s other criminal activities, by providing him with sensitive FSB law enforcement and intelligence information that would have helped him avoid detection by U.S. and other law enforcement agencies outside Russia, including information regarding FSB investigations of computer hacking and FSB techniques for identifying criminal hackers,” the Justice Department charged in its press statement about the indictments.

“Additionally, while working with his FSB conspirators to compromise Yahoo’s network and its users, Belan used his access to steal financial information such as gift card and credit card numbers from webmail accounts; to gain access to more than 30 million accounts whose contacts were then stolen to facilitate a spam campaign; and to earn commissions from fraudulently redirecting a subset of Yahoo’s search engine traffic,” the government alleges.


Each of the four men face 47 criminal charges, including conspiracy, computer fraud, economic espionage, theft of trade secrets and aggravated identity theft.

Dokuchaev, who is alleged to have used the hacker nickname “Forb,” was arrested in December in Moscow. According to a report by the Russian news agency Interfax, Dokuchaev was arrested on charges of treason for alleging sharing information with the U.S. Central Intelligence Agency (CIA). For more on that treason case, see my Jan. 28, 2017 story, A Shakeup in Russia’s Top Cybercrime Unit.

For more on Dokuchaev’s allegedly checkered past (Russian news sites report that he went to work for the FSB to avoid being prosecuted for bank fraud) check out this fascinating story from Russian news outlet Vedomosti, which featured an interview with the hacker Forb from 2004.

In September 2016, Yahoo first disclosed the theft of 500 million accounts that is being attributed to this conspiracy. But in December 2016, Yahoo acknowledged a separate hack from 2013 had jeopardized more than a billion user accounts.

The New York Times reports that Yahoo said it has not been able to glean much information about that attack, which was uncovered by InfoArmor, an Arizona security firm. Interestingly, that attack also involved the use of forged Yahoo cookies, according to a statement from Yahoo’s chief information security officer.

The one alleged member of this conspiracy who would have been simple to catch is Baratov, as he does not appear to have hidden his wealth and practically peppers the Internet with pictures of six-digit sports cars he has owned over the years.

Baratov was arrested on Tuesday in Canada, where the matter is now pending with Canadian authorities. U.S. prosecutors are now trying to seize Baratov’s black Mercedes Benz C54 and his Aston Martin DBS, arguing that they were purchased with the proceeds from cybercrime activity.

A redacted copy of the indictment is available here.

Update, Mar. 16, 5:20 p.m. ET: A previous caption on one of the above photos misidentified the make and model of a car. Also, an earlier version of this story incorrectly stated that Yahoo had attributed its 2013 breach to a state-sponsored actor; the company says it has not yet attributed that intrusion to any one particular actor.


Cory DoctorowPreorder my novel Walkaway and get a pocket multitool

Tor has produced a multitool to commemorate my forthcoming novel Walkaway, and if you pre-order the book, they’ll send you one! Protip: pre-order from Barnes and Noble and you’ll get a signed copy!

The book has received some humblingly great early notices:

Edward Snowden: Is Doctorow’s fictional Utopia bravely idealistic or bitterly ironic? The answer is in our own hands. A dystopian future is in no way inevitable; Walkaway reminds us that the world we choose to build is the one we’ll inhabit. Technology empowers both the powerful and the powerless, and if we want a world with more liberty and less control, we’re going to have to fight for it.

William Gibson: The darker the hour, the better the moment for a rigorously-imagined utopian fiction. Walkaway is now the best contemporary example I know of, its utopia glimpsed after fascinatingly-extrapolated revolutionary struggle. A wonderful novel: everything we’ve come to expect from Cory Doctorow and more.

Kim Stanley Robinson: Cory Doctorow is one of our most important science fiction writers, because he’s also a public intellectual in the old style: he brings the news and explains it, making clearer the confusions of our wild current moment. His fiction is always the heart of his work, and this is his best book yet, describing vividly the revolutionary beginnings of a new way of being. In a world full of easy dystopias, he writes the hard utopia, and what do you know, his utopia is both more thought-provoking and more fun.

Neal Stephenson: Cory Doctorow has authored the Bhagavad Gita of hacker/maker/burner/open source/git/gnu/wiki/99%/adjunct faculty/Anonymous/shareware/thingiverse/cypherpunk/LGTBQIA*/squatter/upcycling culture and zipped it down into a pretty damned tight techno-thriller with a lot of sex in it.

Yochai Benkler: A beautifully-done utopia, just far enough off normal to be science fiction, and just near enough to the near-plausible, on both the utopian and dystopian elements, to be almost programmatic…a sheer delight.

Kirkus Review: A truly visionary techno-thriller that not only depicts how we might live tomorrow, but asks why we don’t already.

CryptogramIoT Teddy Bear Leaked Personal Audio Recordings

CloudPets are an Internet-connected stuffed animals that allow children and parents to send each other voice messages. Last week, we learned that Spiral Toys had such poor security that it exposed 800,000 customer credentials, and two million audio recordings.

As we've seen time and time again in the last couple of years, so-called "smart" devices connected to the internet­ -- what is popularly known as the Internet of Things or IoT­ -- are often left insecure or are easily hackable, and often leak sensitive data. There will be a time when IoT developers and manufacturers learn the lesson and make secure by default devices, but that time hasn't come yet. So if you are a parent who doesn't want your loving messages with your kids leaked online, you might want to buy a good old fashioned teddy bear that doesn't connect to a remote, insecure server.

That's about right. This is me on that issue from 2014.

Sociological ImagesGoogle, Tell Me. Is My Son Gay?

Originally posted at Feminist Reflections.

Screen Shot 2016-06-01 at 3.40.39 PMIn 2014, a story in The New York Times by Seth Stephens-Davidowitz went viral using Google Trend data to address gender bias in parental assessments of their children—“Google, Tell Me. Is My Son a Genius?”  People ask Google whether sons are “gifted” at a rate 2.5x higher than they do for daughters.  When asking about sons on Google, people are also more likely to inquire about genius, intelligence, stupidity, happiness, and leadership than they are about daughters.  When asking about daughters on Google, people are much more likely to inquire about beauty, ugliness, body weight, and just marginally more likely to ask about depression.  It’s a pretty powerful way of showing that we judge girls based on appearance and boys based on abilities.  It doesn’t mean that parents are necessarily consciously attempting to reproduce gender inequality.  But it might mean that they are simply much more likely to take note of and celebrate different elements of who their children are depending on whether those children are girls or boys.

To get the figures, Stephens-Davidowitz relied on data from Google Trends. The tool does not give you a sense of the total number of searches utilizing specific search terms; it presents the relative popularity of search terms compared with one another on a scale from 0 to 100, and over time (since 2004).  For instance, it allows people selling used car parts to see whether people searching for used car parts are more likely to search for “used car parts,” “used auto parts,” or something else entirely before they decide how to list their merchandise online.  I recently looked over the data the author relied on for the piece.  Stephens-Davidowitz charted searches for “is my son gifted” against searches for “is my daughter gifted” and then replaced that last word in the search with: smart, beautiful, overweight, etc.

And while people are more likely to turn to Google to ask about their son’s intelligence than whether or not their daughters are overweight, people are much more likely to ask Google about children’s sexualities than any other quality mentioned in the article.  And to be even more precise, parents on Google are primarily concerned with boys’ sexuality.  Below, I’ve charted the relative popularity of searches for “is my son gay” alongside searches for “is my daughter gay,” “is my child gay,” and “is my son gifted.”  I included “child” to illustrate that Google searches here are more commonly gender-specific.  And I include “gifted” to illustrate how much more common searches for son’s sexuality is compared with searches for son’s giftedness (which was among the more common searches in Stephens-Davidowitz’s article).


The general trend of the graph is toward increasing popularity.  People are more likely to ask Google about their children’s sexuality since 2004 (and slightly less likely to ask Google about their children’s “giftedness” over that same time period).  But they are much more likely to inquire about son’s sexuality.  At two points, the graph hits the ceiling.  The first, in November of 2010, corresponds with the release of the movie “Oy Vey! My Son is Gay” about a Jewish family coming to terms with a son coming out as gay and dating a non-Jewish young man.  The second high point, in September of 2011, occurred during a great deal of press surrounding Apple’s recently released “Is my son gay?” app, which was later taken off the market after a great deal of protest.  And certainly, some residual popularity in searches may be associated with increased relative search volume since.  But, the increase in relative searches for “is my son gay” happens earlier than either of these events.

Relative Search PopularityIndeed, over the period of time illustrated here, people were 28x more likely to search for “is my son gay” than they were for “is my son gifted.”  And searches for “is my son gay” were 4.7x more common than searches for “is my daughter gay.”

Reading Google Trends is a bit like reading tea leaves in that it’s certainly open to interpretation.  For instance, this could mean that parents are increasingly open to sexual diversity and are increasingly attempting to help their children navigate coming to terms with their sexual identities (whatever those identities happen to be).  Though, were this the case, it’s interesting that parents are apparently more interested in helping their sons navigate any presumed challenges than their daughters.  It could mean that as performances of masculinity shift and take on new forms, sons are simply much more likely to engage with gender in ways that cause their parents to question their (hetero)sexuality than they used to.  Or it could mean that parents are more scared that their sons might be gay.  It is likely all of these things.

I’m not necessarily sold on the idea that the trend can only be seen as a sign of the endurance of gender and sexual inequality.  But one measure of that might be to check back in with Google Trends to see if people start asking Google whether their sons and daughters are straight.  At present, both searches are uncommon enough that Google Trends won’t even display their relative popularity.

Tristan Bridges, PhD is a professor at The College at Brockport, SUNY. He is the co-editor of Exploring Masculinities: Identity, Inequality, Inequality, and Change with C.J. Pascoe and studies gender and sexual identity and inequality. You can follow him on Twitter here. Tristan also blogs regularly at Inequality by (Interior) Design.

(View original at

Planet DebianDirk Eddelbuettel: RcppEigen

A new maintenance release of RcppEigen, still based on Eigen 3.2.9 is now on CRAN and is now going into Debian soon.

This update ensures that RcppEigen and the Matrix package agree on their #define statements for the CholMod / SuiteSparse library. Thanks to Martin Maechler for the pull request. I also added a file src/init.c as now suggested (soon: requested) by the R CMD check package validation.

The complete NEWS file entry follows.

Changes in RcppEigen version (2017-03-14)

  • Synchronize CholMod header file with Matrix package to ensure binary compatibility on all platforms (Martin Maechler in #42)

  • Added file init.c with calls to R_registerRoutines() and R_useDynamicSymbols(); also use .registration=TRUE in useDynLib in NAMESPACE

Courtesy of CRANberries, there is also a diffstat report for the most recent release.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

CryptogramFrance Abandons Plans for Internet Voting

Some good election security news for a change: France is dropping its plans for remote Internet voting, because it's concerned about hacking.

Planet DebianMichal Čihař: Life of free software project

During last week I've noticed several interesting posts about challenges being free software maintainer. After being active in open source for 16 years I can share much of the feelings I've read and I can also share my dealings with the things.

First of all let me link some of the other posts on the topic:

I guess everybody involved in in some popular free software project knows it - there is much more work to be done than people behind the project can handle. It really doesn't matter it those are bug reports, support requests, new features or technical debt, it's simply too much of that. If you are the only one behind the project it can feel even more pressing.

There can be several approaches how to deal with that, but you have to choose what you prefer and what is going to work for you and your project. I've used all of the below mentioned approaches on some of the projects, but I don't think there is a silver bullet.

Finding more people

Obviously if you can not cope with the work, let's find more people to do the work. Unfortunately it's not that easy. Sometimes people come by, contribute few patches, but it's not that easy to turn them into regular contributor. You should encourage them to stay and to care about the part of the project they have touched.

You can try to attract completely new contributors through programs as Google Summer of Code (GSoC) or Outreachy, but that has it's own challenges as well.

With phpMyAdmin we're participating regularly in GSoC (we've only missed last year as we were not chosen by Google that year) and it indeed helps to bring new people on the board. Many of them even stay around your project (currently 3 of 5 phpMyAdmin team members are former GSoC students). But I think this approach really works only for bigger organizations.

You can also motivate people by money. It's way which is not really much used on free software projects, partly because lack of funding (I'll get to that later) and partly because it doesn't necessarily bring long time contributors, just cash hunters. I've been using Bountysource for some of my projects (Weblate and Gammu) and so far it mostly works other way around - if somebody posts bounty on the issue, it means it's quite important for him to get that fixed, so I use that as indication for myself. On attracting new developers it never really worked well, even when I've tried to post bounties to some easy to fix issues, where newbies could learn our code base and get paid for that. These issues stayed opened for months and in the end I've fixed them myself because they annoyed me.

Don't care too much

I think this is most important aspect - you simply can never fix all the problems. Let's face it and work according to that. There can be various levels of don't caring. I find it always better to try to encourage people to fix their problem, but you can't expect big success rate in that, so you might find it not worth of the time.

What I currently do:

  • I often ignore direct emails asking for fixing something. The project has public issue tracker on purpose. Once you solve the issue there others will have chance to find it when they face similar problem. Solving things privately in mails will probably make you look at similar problems again and again.
  • I try to batch process things. It is really easier to get focused when you work on one project and do not switch contexts. This means people will have to wait until you get to their request, but it also means that you will be able to deal them much more effectively. This is why Free hosting requests for Hosted Weblate get processed once in a month.
  • I don't care about number of unread mails, notifications or whatever. Or actually I try to not get much of these at all. This is really related to above, I might to some things once in a month (or even less) and that's still okay. Maybe you're just getting notifications for things you really don't need to get notified on? Do you really need notification for new issues? Isn't it better just to look at the issue tracker once in a time than constantly feeling the pressure of not read notifications?
  • I don't have to fix every problem. When it seems like something what could be as well fixed by the reporter, I just try to give them guidance how to dig deeper into the issue. Obviously this can't work for all cases, but getting more people on board always helps.
  • I try to focus on things which can save time in future. Many issues turn out to be just some unclear things and once you figure out that, spend few more minutes to improve your documentation to cover that. It's quite likely that this will save your time in future.

If you still can't handle that, you should consider abandoning the project as well. Does it bring something to you other than frustration of not completed work? I know it can be hard decision, in the end it is your child, but sometimes it's the best think you can do.

Get paid to do the work

Are you doing your fulltime job and then work on free software on nights or weekends? It can probably work for some time, but unless you find some way to make these two match, you will lack free time to relax and spend with friends or family. There are several options to make these work together.

You can find job where doing free software will be natural part of it. This worked for me pretty well at SUSE, but I'm sure there are more companies where it will work. It can happen that the job will not cover all your free software activities, but this still helps.

You can also make your project to become your employer. This can be sometimes challenging to make volunteers and paid contractors to work on one project, but I think this can be handled. Such setup currently works currently quite well for phpMyAdmin (we will announce second contractor soon) and works quite well for me with Weblate as well.

Funding free software projects

Once your project is well funded, you can fix many problems by money. You can pay yourself to do the work, hire additional developers, get better infrastructure or travel to conferences to spread word about it. But the question is how to get to the point of being well funded.

There are several crowdfunding platforms which can help you with that (Liberapay, Bountysource salt, Gratipay or Snowdrift to mention some). You can also administer the funding yourself or using some legal entity such as Software Freedom Conservancy which handles this for phpMyAdmin.

But the most important thing is to persuade people and companies to give back. You know there are lot of companies relying on your project, but how to make them fund the project? I really don't know, I still struggle with this as I don't want to be too pushy in asking for money, but I'd really like to see them to give back.

What kind of works is giving your sponsors logo / link placement on your website. If your website is well ranked, you can expect to get quite a lot of SEO sponsors and the question is where to draw a line what you still find acceptable. Obviously the most willing to pay companies will have nothing to do with what you do and they just want to get the link. The industry you can expect is porn, gambling, binary options and various MFA sites. You will get some legitimate sponsors related to your project as well. We felt we've gone too far with phpMyAdmin last year and we've stricten the rules recently, but the outcome is still not visible on our website (as we've just limited new sponsors, but existing contracts will be honored).

Another option is to monetize your project more directly. You can offer consulting services or provide it as a service (this is what I currently do with Weblate). It really depends on the product if you can build customer base on that or not, but certainly this is not something what would work well for all projects.

Thanks for reading this and I hope it's not too chaotic, as I've moved parts there and back while writing and I'm afraid it got too long in the end.

Filed under: Debian English Gammu phpMyAdmin SUSE Weblate | 0 comments

Planet DebianBits from Debian: Build Android apps with Debian: apt install android-sdk

In Debian stretch, the upcoming new release, it is now possible to build Android apps using only packages from Debian. This will provide all of the tools needed to build an Android app targeting the "platform" android-23 using the SDK build-tools 24.0.0. Those two are the only versions of "platform" and "build-tools" currently in Debian, but it is possible to use the Google binaries by installing them into /usr/lib/android-sdk.

This doesn't cover yet all of the libraries that are used in the app, like the Android Support libraries, or all of the other myriad libraries that are usually fetched from jCenter or Maven Central. One big question for us is whether and how libraries should be included in Debian. All the Java libraries in Debian can be used in an Android app, but including something like Android Support in Debian would be strange since they are only useful in an Android app, never for a Debian app.

Building apps with these packages

Here are the steps for building Android apps using Debian's Android SDK on Stretch.

  1. sudo apt install android-sdk android-sdk-platform-23
  2. export ANDROID_HOME=/usr/lib/android-sdk
  3. In build.gradle, set compileSdkVersion to 23 and buildToolsVersion to 24.0.0
  4. run gradle build

The Gradle Android Plugin is also packaged. Using the Debian package instead of the one from online Maven repositories requires a little configuration before running gradle. In the buildscript block:

  • add maven { url 'file:///usr/share/maven-repo' } to repositories
  • use compile '' to load the plugin

Currently there is only the target platform of API Level 23 packaged, so only apps targeted at android-23 can be built with only Debian packages. There are plans to add more API platform packages via backports. Only build-tools 24.0.0 is available, so in order to use the SDK, build scripts need to be modified. Beware that the Lint in this version of Gradle Android Plugin is still problematic, so running the :lint tasks might not work. They can be turned off with lintOptions.abortOnError in build.gradle. Google binaries can be combined with the Debian packages, for example to use a different version of the platform or build-tools.

Why include the Android SDK in Debian?

While Android developers could develop and ship apps right now using these Debian packages, this is not very flexible since only build-tools-24.0.0 and android-23 platform are available. Currently, the Debian Android Tools Team is not aiming to cover the most common use cases. Those are pretty well covered by Google's binaries (except for the proprietary license on the Google binaries), and are probably the most work for the Android Tools Team to cover. The current focus is on use cases that are poorly covered by the Google binaries, for example, like where only specific parts of the whole SDK are used. Here are some examples:

  • tools for security researchers, forensics, reverse engineering, etc. which can then be included in live CDs and distros like Kali Linux
  • a hardened APK signing server using apksigner that uses a standard, audited, public configuration of all reproducibly built packages
  • Replicant is a 100% free software Android distribution, so of course they want to have a 100% free software SDK
  • high security apps need a build environment that matches their level of security, the Debian Android Tools packages are reproducibly built only from publicly available sources
  • support architectures besides i386 and amd64, for example, the Linaro LAVA setup for testing ARM devices of all kinds uses the adb packages on ARM servers to make their whole testing setup all ARM architecture
  • dead simple install with strong trust path with mirrors all over the world

In the long run, the Android Tools Team aims to cover more use cases well, and also building the Android NDK. This all will happen more quickly if there are more contributors on the Android Tools team! Android is the most popular mobile OS, and can be 100% free software like Debian. Debian and its derivatives are one of the most popular platforms for Android development. This is an important combination that should grow only more integrated.

Last but not least, the Android Tools Team wants feedback on how this should all work, for example, ideas for how to nicely integrate Debian's Java libraries into the Android gradle workflow. And ideally, the Android Support libraries would also be reproducibly built and packaged somewhere that enforces only free software. Come find us on IRC and/or email!

Worse Than FailureCodeSOD: The Tokens That Wouldn’t Die


Sacha received custody of a legacy Python API, and was tasked with implementing a fresh version of it.

He focused on authentication first. The existing API used JSON web tokens that, for some reason, never expired. Assignments like expiration=86400 and expiration=3600 were littered throughout the code, but seemed to go ignored.

It didn't take long to track down the token generating code and figure out the tokens' source of (near) immortality:

expInTS = calendar.timegm(
expiration_seconds = 86400
expiration = ( + datetime.timedelta(seconds=expiration_seconds))
return {'status': True,
        "auth_token": user.generate_auth_token(expiration=expInTS),
        'code': code,
        "token_expiration": expiration.strftime('%Y-%m-%dT%H:%M:%S'),
        'user': user.to_json()}, 200

Several expiration-related variables are set up at first, and even the original coder seemed to have gotten confused by them. When generating the token, he or she used expInTS for the expiration value instead of expiration. The problem is that expInTS is set to the current Unix timestamp—which is the number of seconds that have transpired since January 1, 1970.

The slip was confirmed when Sacha looked at a token header:

 alg: "HS256",
 exp: 2977106874,
 iat: 1488553437

iat (issued at) shows the Unix timestamp when the token was created. The timestamp was then added to itself, resulting in the timestamp for expiration shown in exp. That timestamp corresponds to May 4, 2064, a date by which most of us will be dead or retired.

Profound, yes, but not exactly desirable. Sacha adjusted the expiration value to 86400 seconds (1 day), then moved along.

[Advertisement] Release! is a light card game about software and the people who make it. Play with 2-5 people, or up to 10 with two copies - only $9.95 shipped!

Planet Linux AustraliaOpenSTEM: Am I a Neanderthal?

Early reconstruction of NeanderthalEarly reconstruction of Neanderthal

The whole question of how Neanderthals are related to us (modern humans) has been controversial ever since the first Neanderthal bones were found in Germany in the 19th century. Belonging to an elderly, arthritic individual (a good example of how well Neanderthals cared for each other in social groups), the bones were reconstructed to show a stooping individual, with a more ape-like gait, leading to Neanderthals being described as the “Missing Link” between apes and humans, and given the epithet “ape-man”.

Who were the Neanderthals?

Modern reconstruction – Smithsonian Museum of Natural History

Neanderthals lived in the lands surrounding the Mediterranean Sea, and as far east as the Altai Mountains in Central Asia, between about 250,000 and about 30,000 years ago. They were a form of ancient human with certain physical characteristics – many of which probably helped them cope with the cold of Ice Ages. Neanderthals evolved out of an earlier ancestorHomo erectus, possibly through another species – Homo heidelbergensis. They had a larger brain than modern humans, but it was shaped slightly differently, with less development in the prefrontal cortex, which allows critical thinking and problem-solving, and larger development at the back of the skull, and in areas associated with memory in our brains. It is possible that Neanderthals had excellent memory, but poor analytical skills. They were probably not good at innovation – a skill which became vital as the Ice Age ended and the global climate warmed, sea levels rose and plant and animal habitats changed.

Neanderthals were stockier than modern humans, with shorter arms and legs, and probably stronger and all-round tougher. They had a larger rib cage, and probably bigger lungs, a bigger nose, larger eyes and little to no chin. Most of these adaptations would have helped them in Ice Age Europe and Asia – a more compact body stayed warmer more easily and was tough enough to cope with a harsh environment. Large lungs helped oxygenate the blood and there is evidence that they had more blood supply to the face – so probably had warm, ruddy cheeks. The large nose warmed up the air they breathed, before it reached their lungs, reducing the likelihood of contracting pneumonia. Neanderthals are known to have had the same range of hair colours as modern humans and fair skin, red hair and freckles may have been more common.

They made stone tools, especially those of the type called Mousterian, constructed simple dwellings and boats, made and used fire, including for cooking their food, and looked after each other in social groups. Evidence of skeletons with extensive injuries occurring well before death, shows that these individuals must have been cared for, not only whilst recovering from their injuries, but also afterwards, when they would probably not have been able to obtain food themselves. Whether or not Neanderthals intentionally buried their dead is an area of hot controversy. It was once thought that they buried their dead with flowers in the grave, but the pollen was found to have been introduced accidentally. However, claims of intentional burial are still debated from other sites.

What Happened to the Neanderthals?

Abrigo do Lagar Velho

Anatomically modern humans emerged from Africa about 100,000 years ago. Recent studies of human genetics suggests that modern humans had many episodes of mixing with various lineages of human ancestors around the planet. Modern humans moved into Asia and Europe during the Ice Age, expanding further as the Ice Age ended. Modern humans overlapped with Neanderthals for about 60,000 years, before the Neanderthals disappeared. It is thought that a combination of factors led to the decline of Neanderthals. Firstly, the arrival of modern humans, followed by the end of the Ice Age, brought about a series of challenges which Neanderthals might have been unable to adapt to, as quickly as necessary. Modern humans have more problem solving and innovation capability, which might have meant that they were able to out-compete Neanderthals in a changing environment. The longest held theory is that out ancestors wiped out the Neanderthals in the first genocide in (pre)history. A find of Neanderthals in a group, across a range of ages, some from the same family group, who all died at the same time, is one of the sites, which might support this theory, although we don’t actually know who (or what) killed the group. Cut marks on their bones show that they were killed by something using stone tools. Finally, there is more and more evidence of what are called “transitional specimens”. These are individuals who have physical characteristics of both groups, and must represent inter-breeding. An example is the 4 year old child from the site of Abrigo do Lagar Velho in Portugal, which seems to have a combination of modern and Neanderthal features. The discovery of Neanderthals genes in many modern people living today is also proof that we must have interbred with Neanderthals in the past. It is thought that the genes were mixed several times, in several parts of the world.

Am I a Neanderthal?

So how do we know if we have Neanderthals genes? Neanderthal genes have some physical characteristics, but also other attributes that we can’t see. In terms of physical characteristics, Neanderthal aspects to the skull include brow ridges (ridges of bone above the eyes, under the eyebrows); a bump on the back of the head – called an occipital chignon, or bun, because it looks like a ‘bun’ hairstyle, built into the bone; a long skull (like Captain Jean-Lu Picard from Star Trek – actor Patrick Stewart); a small, or non-existent chin; a large nose; a large jaw with lots of space for wisdom teeth; wide fingers and thumbs; thick, straight hair; large eyes; red hair, fair skin and freckles! The last may seem a little surprising, but it appears that the genes for these characteristics came from Neanderthals – who had a wide range of hair colours, fair skin and, occasionally, freckles. Increased blood flow to the face also would have given Neanderthals lovely rosy cheeks!

Less obvious characteristics include resistance to certain diseases – parts of our immune systems, especially with reference to European and Asian diseases; less positively, an increased risk of other diseases, such as type 2 diabetes. Certain genes linked to depression are present, but ‘switched off’ in Neanderthals. The way that these genes link to depression, and their role in the lifestyles of early people (where they may have had benefits that are no longer relevant) are future topics for research and may help us understand more about ourselves.

Neanderthals genes are present in modern populations from Europe, Asia, Northern Africa, Australia and Oceania. So, depending on which parts of the world our ancestry is from, we may have up to 4% of our genetics from long-dead Neanderthal ancestors!


Planet Linux AustraliaCraige McWhirter: Japanese House in Python for Minecraft

I have kids that I'm teaching to hack. They started of on Scratch (which is excellent) and are ready to move up to Python. They also happen to be mad Minecraft fans, so now they're making their way through Adventures in Minecraft.

As I used Scratch when they were, I'm also hacking in Python & Minecraft as they are. It helps if I hit the bumps and hurdles before they do, as well as have a sound handle on the problems they're solving.

Now I've branched out from the tutorial and I'm just having fun with it and leaving behind code the kids can use, hack whatever. This code is in my minecraft-tools repo (for want of a better name). It's just a collection of random tools I've written for Minecraft aren't quite up to being their own thing. I expect this will mostly be a collection of python programs to construct things inside Minecraft via CanaryMod and CanaryRaspberryJuicePlugin.

The first bit of code to be shaken out of the tree is which produces a Minecraft interpretation of a classic Japanese house. Presently it only produces the single configuration that is little more than an empty shell.

Japanese House (day) Japanese House (night)

I intend to add an interior fit out plus a whole bunch of optional configurations you can set at run time but for now it is what it is, as I'm going to move onto writing geodesic domes and transport | teleport rings (as per the Expanse, which will lead to eventually coding a TARDIS, that will you know, be actually bigger on the inside ;-)

Sam VargheseAustralia taking a big risk by playing Cummins

AUSTRALIA is likely to regret pushing Patrick Cummins into Test cricket before he has had a chance to play at least one season of matches in the Sheffield Shield to test out his body.

That Australia is not good at monitoring its players is evident from Mitchell Starc’s breaking down in India. Starc was ruled out of the India series after two Tests, with a stress fracture in his right foot.

As the cricket website espncricinfo has detailed, Starc is no stranger to injuries: he has been suffering from a spate of them right from December 2012.

If the Australian team doctors and physiotherapist could not monitor him enough to prevent his breaking down in what is billed as a series that is even more important than the Ashes, then what hope for Cummins?

Cummins made a spectacular debut in South Africa in 2011, but thereafter he has been hit by injuries one after the other. He made a good showing in the recent Big Bash League, but one has to bowl just four overs per game in that league.

He also played in the one-dayers against Pakistan, but again that is a matter of bowling a maximum of 10 overs.

And one must bear in mind that Cummins’ outings in T20 and ODIs have both been on Australian pitches which are firm and provide good support for fast bowlers as they pound their way up to the crease.

Indian pitches are a different kettle of fish. The soil is loose, and additionally the curators are dishing up spinning surfaces that will help the home team. Nothing wrong with that, every country does it.

But what needs to be noted is that loose soil does not give a fast bowler a good grip as he storms up to the crease. Sawdust does not help much either unless there is a firm foundation.

Cummins has looked good for some time now. But pitching him into the cauldron that is the Australia-India series, especially at this stage, does not seem to be a very sensible thing to do.

Cricket Australia may well like to retain the Border-Gavaskar trophy but should it take a risk with Cummins who is an excellent long-term prospect?

Fingers crossed that one of the faster of today’s bowlers gets through the two remaining Tests in India without anything going wrong. But one has serious doubts on that score.

Krebs on SecurityAdobe, Microsoft Push Critical Security Fixes

Adobe and Microsoft each pushed out security updates for their products today. Adobe plugged at least seven security holes in its Flash Player software. Microsoft, which delayed last month’s Patch Tuesday until today, issued an unusually large number of update bundles (18) to fix dozens of flaws in Windows and associated software.

brokenwindowsMicrosoft’s patch to fix at least five critical bugs in the Windows file-sharing service is bound to make a great deal of companies nervous before they get around to deploying this week’s patches. Most organizations block internal file-sharing networks from talking directly to their Internet-facing networks, but these flaws could be exploited by a malicious computer worm to spread very quickly once inside an organization with a great many unpatched Windows systems.

Another critical patch (MS17-013) covers a slew of dangerous vulnerabilities in the way Windows handles certain image files. Malware or miscreants could exploit the flaws to foist malicious software without any action on the part the user, aside from perhaps just browsing to a hacked or booby-trapped Web site.

According to a blog post at the SANS Internet Storm Center, the image-handling flaw is one of six bulletins Microsoft released today which include vulnerabilities that have either already been made public or that are already being exploited. Several of these are in Internet Explorer (CVE 2017-0008/MS17-006) and/or Microsoft Edge (CVE-2017-0037/MS17-007).

For a more in-depth look at today’s updates from Microsoft, check out this post from security vendor Qualys.

And as per usual, Adobe used Patch Tuesday as an occasion to release updates for its Flash Player software. The latest update brings Flash to v. for Windows, Mac and Linux users alike. If you have Flash installed, you should update, hobble or remove Flash as soon as possible. To see which version of Flash your browser may have installed, check out this page.

brokenflash-aThe smartest option is probably to ditch the program once and for all and significantly increase the security of your system in the process. An extremely powerful and buggy program that binds itself to the browser, Flash is a favorite target of attackers and malware. For some ideas about how to hobble or do without Flash (as well as slightly less radical solutions) check out A Month Without Adobe Flash Player.

If you choose to keep Flash, please update it today. The most recent versions of Flash should be available from the Flash home page. Windows users who browse the Web with anything other than Internet Explorer may need to apply this patch twice, once with IE and again using the alternative browser (Firefox, Opera, e.g.).

Chrome and IE should auto-install the latest Flash version on browser restart (users may need to manually check for updates in and/or restart the browser to get the latest Flash version). Chrome users may need to restart the browser to install or automatically download the latest version. When in doubt, click the vertical three dot icon to the right of the URL bar, select “Help,” then “About Chrome”: If there is an update available, Chrome should install it then.

Finally, Adobe also issued a patch for its Shockwave Player, which is another program you should probably ditch if you don’t have a specific need for it. The long and short of it is that Shockwave often contains the same exploitable Flash bugs but doesn’t get patched anywhere near as often as Flash. Please read Why You Should Ditch Adobe Shockwave if you have any doubts on this front.

As always, if you experience any issues downloading or installing any of these updates, please leave a note about it in the comments below.

Planet DebianKeith Packard: Valve

Consulting for Valve in my spare time

Valve Software has asked me to help work on a couple of Linux graphics issues, so I'll be doing a bit of consulting for them in my spare time. It should be an interesting diversion from my day job working for Hewlett Packard Enterprise on Memory Driven Computing and other fun things.

First thing on my plate is helping support head-mounted displays better by getting the window system out of the way. I spent some time talking with Dave Airlie and Eric Anholt about how this might work and have started on the kernel side of that. A brief synopsis is that we'll split off some of the output resources from the window system and hand them to the HMD compositor to perform mode setting and page flips.

After that, I'll be working out how to improve frame timing reporting back to games from a composited desktop under X. Right now, a game running on X with a compositing manager can't tell when each frame was shown, nor accurately predict when a new frame will be shown. This makes smooth animation rather difficult.

CryptogramDigital Security Exchange: Security for High-Risk Communities

I am part of this very interesting project:

For many users, blog posts on how to install Signal, massive guides to protecting your digital privacy, and broad statements like "use Tor" -- all offered in good faith and with the best of intentions -- can be hard to understand or act upon. If we want to truly secure civil society from digital attacks and empower communities in their to fight to protect their rights, we've got to recognize that digital security is largely a human problem, not a technical one. Taking cues from the experiences of the deeply knowledgeable global digital security training community, the Digital Security Exchange will seek to make it easier for trainers and experts to connect directly to the communities in the U.S. -- sharing expertise, documentation, and best practices -- in order to increase capacity and security across the board.

Planet DebianJohn Goerzen: Parsing the GOP’s Health Insurance Statistics

There has been a lot of noise lately about the GOP health care plan (AHCA) and the differences to the current plan (ACA or Obamacare). A lot of statistics are being misinterpreted.

The New York Times has an excellent analysis of some of this. But to pick it apart, I want to highlight a few things:

Many Republicans are touting the CBO’s estimate that, some years out, premiums will be 10% lower under their plan than under the ACA. However, this carries with it a lot of misleading information.

First of all, many are spinning this as if costs would go down. That’s not the case. The premiums would still rise — they would just have risen less by the end of the period than under ACA. That also ignores the immediate spike and throwing millions out of the insurance marketplace altogether.

Now then, where does this 10% number come from? First of all, you have to understand the older people are substantially more expensive to the health system, and therefore more expensive to insure. ACA limited the price differential from the youngest to the oldest people, which meant that in effect some young people were subsidizing older ones on the individual market. The GOP plan removes that limit. Combined with other changes in subsidies and tax credits, this dramatically increases the cost to older people. For instance, the New York Times article cites a CBO estimate that “the price an average 64-year-old earning $26,500 would need to pay after using a subsidy would increase from $1,700 under Obamacare to $14,600 under the Republican plan.”

They further conclude that these exceptionally high rates would be so unaffordable to older people that the older people will simply stop buying insurance on the individual market. This means that the overall risk pool of people in that market is healthier, and therefore the average price is lower.

So, to sum up: the reason that insurance premiums under the GOP plan will rise at a slightly slower rate long-term is that the higher-risk people will be unable to afford insurance in the first place, leaving only the cheaper people to buy in.

CryptogramRansomware for Sale

Brian Krebs posts a video advertisement for Philadelphia, a ransomware package that you can purchase.

Worse Than FailureFrayed Fiber

The 80's were a time of great technological marvels. The Walkman allowed a person to listen to music anywhere they went. The Video Cassette Recorder allowed you to watch your favorite movies repeatedly until they wore out. Then there was the magic of Fiber Optics. Advances in the light-blasted-through-glass medium allowed places like Seymour's company to share data between offices at blistering speeds.

Bill, the President of Seymour's company, always wanted them to be on the cutting edge of technology. He didn't always know the why or the how surrounding it, but when he heard about something that sounded cool, he wanted to be the first company to have it. That's where Seymour came in. As Vice President of Technological Development (a fancy job title he got for being the organization's only true techie) he made Bill's dreams come true. All he had to do was ask for the company credit card.

an illuminated bundle of fiber optic cable

When Bill caught wind of fiber optics at a trade show, he came back to the office ranting and raving about it. "Seymour, we've got to link the offices up with these fiber optical things!" he shouted with enthusiasm. Since their buildings were a mere three miles apart it seemed like overkill, but Seymour was bored and needed a new project. "I've had it with these slow noisy modem things we use to exchange data! I want you to weave these fibers into our computers. You can start today!"

Seymour had to calm Bill down and explain to him what a big ordeal getting set up on fiber would be. Since there weren't any existing lines in town, one would have to be routed underground on the route between offices. Seymour got in contact with local utility and telecommunications companies and the initiative was underway.

Fast-forwarding eight months, Seymour's fiber connection was a success. The cranky old modems had been mothballed and were a distant memory. Files and reports were being sent between offices at literal light-speed. Bill made it worth all the trouble with a sizable deposit into Seymour's bank account and his own company credit card. But then one day things went awry.

Seymour's phone rang at 6:30 one morning. Bill, always the early arriver, was on the other end in a panic. "Seymour! You need to get here right now! The fibers are cooked and we can't download anything to the other office!" Seymour quickly threw on some clothes and got in his car. His commute took longer than normal because of some irritating utility work slowing down traffic but he was sure he'd have it solved in no time.

Upon arrival, he took out his trusty fiber testing kit and hooked it up to one of the pairs. Nothing. He tried the next pair. Nothing. The other 13 pairs yielded the same result. "What in the hell?" he thought to himself, with Bill hovering over his shoulder. Further inspections showed nothing was wrong with their equipment in the building.

"Seymour, this isn't acceptable!" Bill bellowed to him, growing sweatier by the minute. "First it takes you forever to get here, now you don't have any answers for me!"

"I'm sorry, Bill. I got here as soon as I could. There was this damned utility work in the way..." Seymour cut himself off as an illuminated fiber light went off in his head. "I'll be right back!" Seymour ran out to his car to drive back the way he came. The route he took to work also happened to share some of the fiber line's route.

He stopped at the dig site to find it mostly cleaned up with one construction worker remaining. Inspecting the ground, he found the utility company had done their work spray painting the correct areas not to dig. Green here, for the sewer, yellow for natural gas gas over there, and a communications line there. A new utility pole stood proudly, far away from any of the marked areas.

Well, it was a good thought, anyway. Seymour ducked under the pole's anchor cable and started back to his car- then stopped. He looked at the anchor cable, and tracked its end down into the orange spray-paint that marked a communication line. He bent down for a closer look and found shredded bits of fiber optic cable. Bingo. He flagged down the last remaining worker to point it out, "Excuse me, sir. I think there's been an accident. This line here was essential to my company's computer system."

The portly man in a hard had sauntered over, unconcerned. "Wut? This here thing? Ain't nothin but a bundle of fishing line some'un went an buried fer some reason. This ain't no computer."

"Oh, right... My mistake," Seymour offered a token apology and decided he wasn't going to get through to this particular city worker. He drove back to the office and filled Bill in on the mishap. Bill's anger was quickly channeled into an unfriendly phone call to city hall and within 24 hours Seymour's incredible fiber line was back in service. After all the effort the past several months, a getaway to use actual fishing line for its intended purpose sounded like something Seymour badly needed.

[Advertisement] Manage IT infrastructure as code across all environments with Puppet. Puppet Enterprise now offers more control and insight, with role-based access control, activity logging and all-new Puppet Apps. Start your free trial today!

Planet Linux AustraliaDavid Rowe: Testing FreeDV 700C

Since releasing FreeDV 700C I’ve been “instrumenting” the FreeDV GUI program – adding some code to perform various tests of the 700C waveform, especially over the air.

With the kind help of Gerhard OE3GBB, Mark VK5QI, and Peter VK5APR, I have collected some samples and performed some tests. The goals of this work were:

  1. Compare 700C Over the Air (OTA) to simulation on an AWGN channel.
  2. Compare 700C OTA to SSB on an AWGN channel.


Here is a screen shot of the latest FreeDV GUI Options screen:

I’ve added some features to the top three rows:

Test Frames Send a payload of known test bits rather than vocoder bits
Channel Noise Simulate a channel using AWGN noise
SNR SNR of AWGN noise
Attn Carrier Attenuate just one carrier
Carrier The 700C carrier (1-14) to attenuate
Simulated Interference Tone Enable an interfering sine wave of specified frequency and amplitude
Clipping Enable clipping of 700C tx waveform, to increase RMS power
Diversity Combine for plots Scatter and Test Frame plots use combined (7 carrier) information.

To explore these options it is useful to run in full duplex mode (Tools-PTT Half Duplex unchecked) and using a loopback sound device:

  $ sudo modprobe snd-aloop

More information on loopback in the FreeDV GUI README.

Clipping the 700C tx waveform reduces the Peak to Average Power ratio (PAPR) which may be result in a higher average power over the channel. However clipping distorts the waveform and add some “shoulders (i.e. noise) to the spectrum adjacent to the 700C waveform:

Several users have noticed this distortion. At this stage I’m unsure if clipping is useful or not.

The Diversity Combine option is useful to explore each of the 14 carriers separately before they are combined into 7 carriers.

Many of these options were designed to explore tx filtering. I have long wondered if any of the FreeDV carriers were receiving less power than others, for example due to ripple or a low pass response from the crystal filter. A low power carrier would have a high bit error rate, adversely affecting overall performance. Plotting the scatter diagram or bit error rate on a carrier by carrier basis can measure the effect of tx filtering – if it exists.

Some of the features above – like attenuating a single carrier – were designed to “test the test”. Much of the work I do on FreeDV (and indeed other projects) involves carefully developing software and writing “code to test the code”. For example to build the experiments described in this blog post I worked several hours day for several weeks. Not glamorous, but where the real labour lies in R&D. Careful, meticulous testing and experimentation. One percent inspiration … then code, test, test.

Comparing Analog SSB to Digital Voice

One of my goals is to develop a HF DV system that is competitive with analog SSB. So we need a way to compare analog and DV at the same SNR. So I came with the idea of a wave files of analog SSB and DV which have the same average (RMS) power. If these are fed into a SSB transmitter, then they will be received at the same SNR. I added 10 seconds of a 1000Hz sine wave at the start for good measure – this could be used to measure the actual SNR.

I developed two files:

  1. sine_analog_700c
  2. sine_analog_testframes700c

The first has the same voice signal in analog and 700C, the second uses test frames instead of encoded voice.

Interfering Carriers

One feature described above simulates an interfering carrier (like a birdie), something I have seen on the air. Here is a plot of a carrier in the middle of one of the 700C carriers, but about 10dB higher:

The upper RH plot is a rolling plot of bit errors for each carrier. You can see one carrier is really messed up – lots of bit errors. The average bit error rate is about 1%, which is where FreeDV 700C starts to become difficult to understand. These bit errors would not be randomly distributed, but would affect one part of the codec all the time. For example the pitch might be consistently wrong, or part of the speech spectrum. I found that as long as the interfering carrier is below the FreeDV carrier, the effect on bit error rate is negligible.

Take away: The tx station must tune away from any interfering carriers that poke above the FreeDV signal carriers. Placing the interfering tones between FreeDV carriers is another possibility, e.g. a 50Hz shift of the tx signal.

Results – Transmit Filtering

Simulation results suggest 700C should produce reasonable results near 0db SNR. So that’s the SNR I’m shooting for in Over The Air (OTA) testing.

Mark VK5QI sent me several minutes of test frames so I could determine if there were any carriers with dramatically different bit error rates, which would indicate the presence of some tx filtering. Here is the histogram of BERs for each carrier for Mark’s signal, which was at about 3dB SNR:

There is one bar for each I and Q QPSK bit of the 14 carriers – 28 bars total (note Diversity combination was off). After running for a few minutes, we can see a range of 5E-2 and 8E-2 (5 to 8%). In terms of AWGN modem performance, this is only about 1dB difference in SNR or Eb/No, as per the BER versus Eb/No graphs in this post on the COHPSK modem used for 700C. One carrier being pinned at say 20% BER, or a slope of increasing BER with carrier frequency – would have meant tx filtering trouble.

Peter VK5APR, sent me a high SNR signal (he lives just 4 km away). Initially I could see a X shaped scatter diagram, a possible sign of tx filtering. However this ended up being some amplitude pumping due to Fast AGC on my radio. After I disabled fast AGC, I could see a scatter diagram with 4 clear dots, and no X-shape. Check.

I performed an additional test using my IC7200 as a transmitter, and a HackRF as a receiver. The HackRF has no crystal filter and a very flat response, so any distortion would be due to the IC7200 transmit filtering. Once again – 4 clean dots on the scatter diagram and no X-shape.

So I am happy to conclude that transmit filtering does not seem to be a problem, at least of the radios tested. All performance issues are therefore likely to be caused by me and my algorithms!

Results – Low SNR testing

Peter, VK5APR, configured his station to send the analog/700C equi-power test wave files described above at very low power, such that the received SNR at my station was about 0dB. As we are so close it was reasonable to assume the channel was AWGN, indeed we could see no sign of NVIS fading and the band (40M) was devoid of DX at the 12 noon test time.

Here is the rx signal I received, and the same file run through the 700C decoder. Neither the SSB or the decoded 700C audio are pretty. However it’s fair to say we could just get a message through on both modes and that 700C is holding it’s own next to SSB. The results are close to my simulations which was the purpose of this test.

You can decode the off air signal yourself if you download the first file and replay it through the FreeDV GUI program using “Tools – Start/Stop Play File from Radio”.


While setting up these tests, Peter and I conversed comfortably for some time over FreeDV 700C at a high SNR. This proved to me that for our audience (experienced users of HF radio) – FreeDV 700C can be used for conversational contacts. Given the 700C codec is really just a first pass – that’s a fine result.

However it’s a near thing – the 700C codec adds a lot of distortion just compressing the speech. It’s pretty bad even if the SNR is high. The comments on the Codec 2 700C blog post indicate many lay-people can’t understand speech compressed by 700C. Add any bit errors (due to low SNR or fading) and it quickly becomes hard to understand – even for experienced users. This makes 700C very sensitive to bit errors as the SNR drops. But hey – every one of those 28 bits/frame counts at 700 bit/s so it’s not surprising.

In contrast, SSB scales a bit better with SNR. However even at high SNRs, that annoying hiss is always there – which is very fatiguing. Peter and I really noticed that hiss after a few minutes back on SSB. Yuck.

SSB gets a lot of it’s low SNR “punch” from making effective use of peak power. Here is a plot of the received SSB:

It’s all noise except for the speech peaks, where the “peak SNR” is much higher than 0dB. Our brains are adept at picking out words from those peaks, integrating the received phonetic symbols (mainly vowel energy) in our squishy biological receive filters. It’s a pity we didn’t evolve to detect coherent PSK. A curse on your evolution!

In contrast – 700C allocates just as much power to the silence between words as the most important parts of speech. This suggests we could do a better job at tailoring the codec and modem to peak power, e.g. allocating more power to parts of the speech that really matter. I had a pass at Time Variable Quantisation a few years ago. A variable rate codec might be called for, tightly integrated to the modem to pack more bits/power into perceptually important parts of speech.

The results above assumed equal average power for SSB and FreeDV 700C. It’s unclear if this happens in the real world. For example we may need to “back off” FreeDV drive further than SSB; SSB may use a compressor; and the PAs we are using are generally designed for PEP rather than average power operation.

Next Steps

I’m fairly happy with the baseline COHPSK modem, it seems to be hanging on OK as long as there aren’t any co-channel birdies. The 700C codec works better than expected, has plenty of room for improvement – but it’s sensitive to bit errors. So I’m inclined to try some FEC next. Aim for error free 700C at 0dB, which I think will be superior to SSB. I’ll swap out the diversity for FEC. This will increase the raw BER, but allow me to run a serious rate 0.5 code. I’ll start just with an AWGN channel, then tackle fading channels.


FreeDV 700C
Codec 2 700C

Planet DebianReproducible builds folks: Reproducible Builds: week 98 in Stretch cycle

Here's what happened in the Reproducible Builds effort between Sunday March 5 and Saturday March 11 2017:

Upcoming events

Reproducible Builds Hackathon Hamburg

The Reproducible Builds Hamburg Hackathon 2017, or RB-HH-2017 for short, is a 3 day hacking event taking place in the CCC Hamburg Hackerspace located inside the Frappant, which is collective art space located in a historical monument in Hamburg, Germany.

The aim of the hackathon is to spent some days working on Reproducible Builds in every distribution and project. The event is open to anybody interested on working on Reproducible Builds issues in any distro or project, with or without prio experience!

Packages filed

Chris Lamb:

Toolchain development

  • Guillem Jover uploaded dpkg 1.18.23 to unstable, declaring .buildinfo format 1.0 as "stable".

  • Jams McCoy uploaded devscripts 2.17.2 to unstable addingd support for .buildinfo files to the debsign utility via patches from Ximin Luo and Guillem Jover.

  • Hans-Christoph Steiner noted that the first reproducibility-related patch in the Android SDK was marked as confirmed.

Reviews of unreproducible packages

39 package reviews have been added, 7 have been updated and 9 have been removed in this week, adding to our knowledge about identified issues.

2 issue types have been added:

Weekly QA work

During our reproducibility testing, FTBFS bugs have been detected and reported by:

  • Chris Lamb (2) development

reproducible-website development

  • Hans-Christoph Steiner gave a progress report on testing F-Droid: we now have a complete vagrant workflow working in nested KVM! So we can provision a new KVM guest, then package it using vagrant box all inside of a KVM guest (which is a profitbricks build node). So we finally have a working setup on Next up is fixing bugs in our libvirt snapshoting support.
  • Then Hans-Christoph was also able to enable building of all F-Droid apps in our setup, though this is still work in progress…
  • Daniel Shahaf spotted a subtile error in our FreeBSD sudoers configuration and as a result the FreeBSD reproducibility results are back.
  • Holger once again adjusted the Debian armhf scheduling frequency, to cope with the ever increasing amount of armhf builds.
  • Mattia spotted a refactoring error which resulted in no maintenance mails for a week.
  • Holger also spent some time on improving IRC notifications further, though there is still some improvements to be made.


This week's edition was written by Chris Lamb, Holger Levsen, Vagrant Cascadian & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

Planet Linux AustraliaMatthew Oliver: Setting up a basic keystone for Swift + Keystone dev work

As a Swift developer, most of the development works in a Swift All In One (SAIO) environment. This environment simulates a mulinode swift cluster on one box. All the SAIO documentation points to using tempauth for authentication. Why?

Because most the time authentication isn’t the things we are working on. Swift has many moving parts, and so tempauth, which only exists for testing swift and is configured in the proxy.conf file works great.

However, there are times you need to debug or test keystone + swift integration. In this case, we tend to build up a devstack for keystone component. But if all we need is keystone, then can we just throw one up on a SAIO?… yes. So this is how I do it.

Firstly, I’m going to be assuming you have SAIO already setup. If not go do that first. not that it really matters, as we only configure the SAIO keystone component at the end. But I will be making keystone listen on localhost, so if you are doing this on anther machine, you’ll have to change that.

Further, this will set up a keystone server in the form you’d expect from a real deploy (setting up the admin and public interfaces).


Step 1 – Get the source, install and start keystone

Clone the sourcecode:
cd $HOME
git clone

Setup a virtualenv (optional):
mkdir -p ~/venv/keystone
virtualenv ~/venv/keystone
source ~/venv/keystone/bin/activate

Install keystone:
cd $HOME/keystone
pip install -r requirements.txt
pip install -e .
cp etc/keystone.conf.sample etc/keystone.conf

Note: We are running the services from the source so config exists in source etc.


The fernet keys seems to assume a full /etc path, so we’ll create it. Maybe I should update this to put all config there but for now meh:
sudo mkdir -p /etc/keystone/fernet-keys/
sudo chown $USER -R /etc/keystone/

Setup the database and fernet:
keystone-manage db_sync
keystone-manage fernet_setup

Finally we can start keystone. Keystone is a wsgi application and so needs a server to pass it requests. The current keystone developer documentation seems to recommend uwsgi, so lets do that.


First we need uwsgi and the python plugin, one a debian/ubuntu system you:
sudo apt-get install uwsgi uwsgi-plugin-python

Then we can start keystone, by starting the admin and public wsgi servers:
uwsgi --http --wsgi-file $(which keystone-wsgi-admin) &
uwsgi --http --wsgi-file $(which keystone-wsgi-public) &

Note: Here I am just backgrounding them, you could run then in tmux or screen, or setup uwsgi to run them all the time. But that’s out of scope for this.


Now a netstat should show that keystone is listening on port 35357 and 5000:
$ netstat -ntlp | egrep '35357|5000'
tcp 0 0* LISTEN 26916/uwsgi
tcp 0 0* LISTEN 26841/uwsgi

Step 2 – Setting up keystone for swift

Now that we have keystone started, its time to configure it. Firstly you need the openstack client to configure it so:
pip install python-openstackclient

Next we’ll use all keystone defaults, so we only need to pick an admin password. For the sake of this how-to I’ll pick the developer documentation example of `s3cr3t`. Be sure to change this. So we can do a basic keystone bootstrap with:
keystone-manage bootstrap --bootstrap-password s3cr3t

Now we just need to set up some openstack env variables so we can use the openstack client to finish the setup. To make it easy to access I’ll dump them into a file you can source. But feel free to dump these in your bashrc or whatever:
cat > ~/keystone.env <<EOF
export OS_USERNAME=admin
export OS_PASSWORD=s3cr3t
export OS_PROJECT_NAME=admin
export OS_USER_DOMAIN_ID=default
export OS_PROJECT_DOMAIN_ID=default
export OS_AUTH_URL=http://localhost:5000/v3

source ~/keystone.env


Great, now  we can finish configuring keystone. Let’s first setup a service project (tennent) for our Swift cluster:
openstack project create service

Create a user for the cluster to auth as when checking user tokens and add the user to the service project, again we need to pick a password for this user so `Sekr3tPass` will do.. don’t forget to change it:
openstack user create swift --password Sekr3tPass --project service
openstack role add admin --project service --user swift

Now we will create the object-store (swift) service and add the endpoints for the service catelog:
openstack service create object-store --name swift --description "Swift Service"
openstack endpoint create swift public "http://localhost:8080/v1/AUTH_\$(tenant_id)s"
openstack endpoint create swift internal "http://localhost:8080/v1/AUTH_\$(tenant_id)s"

Note: We need to define the reseller_prefix we want to use in Swift. If you change it in Swift, make sure you update it here.


Now we can add roles that will match to roles in Swift, namely an operator (someone who will get a Swift account) and reseller_admins:
openstack role create SwiftOperator
openstack role create ResellerAdmin

Step 3 – Setup some keystone users to auth as.

TODO: create all the tempauth users here


Here, it would make sense to create the tempauth users devs are used to using, but I’ll just go create a user so you know how to do it. First create a project (tennent) for this example demo:
openstack project create --domain default --description "Demo Project" demo

Create a user:
openstack user create --domain default --password-prompt matt

We’ll also go create a basic user role:
openstack role create user

Now connect the 3 pieces together by adding user matt to the demo project with the user role:
openstack role add --project demo --user matt user

If you wanted user matt to be a swift operator (have an account) you’d:
openstack role add --project demo --user matt SwiftOperator

or even a reseller_admin:
openstack role add --project demo --user matt ResellerAdmin

If your in a virtual env, you can leave it now, because next we’re going to go back to your already setup swift to do the Swift -> Keystone part:

Step 4 – Configure Swift

To get swift to talk to keystone we need to add 2 middlewares to the proxy pipeline. And in the case of a SAIO, remove the tempauth middleware. But before we do that we need to install the keystonemiddleware to get one of the 2 middlware’s, keystone’s authtoken:
sudo pip install keystonemiddleware

Now you want to replace your tempauth middleware in the proxy path pipeline with authtoken keystoneauth so it looks something like:
pipeline = catch_errors gatekeeper healthcheck proxy-logging cache bulk tempurl ratelimit crossdomain container_sync authtoken keystoneauth staticweb copy container-quotas account-quotas slo dlo versioned_writes proxy-logging proxy-server

Then in the same ‘proxy-server.conf’ file you need to add the paste filter sections for both of these new middlewares:
paste.filter_factory = keystonemiddleware.auth_token:filter_factory
auth_host = localhost
auth_port = 35357
auth_protocol = http
auth_uri = http://localhost:5000/
admin_tenant_name = service
admin_user = swift
admin_password = Sekr3tPass
delay_auth_decision = True
# cache = swift.cache
# include_service_catalog = False

use = egg:swift#keystoneauth
# reseller_prefix = AUTH
operator_roles = admin, SwiftOperator
reseller_admin_role = ResellerAdmin

Note: You need to make sure if you change the reseller_prefix here, you change it in keystone. And notice this is where you map operator_roles and reseller_admin_role in swift to that in keystone. Here anyone in with the keystone role admin or SwiftOperator are swift operators and those with the ResellerAdmin role are reseller_admins.


And that’s it. Now you should be able to restart your swift proxy and it’ll go off and talk to keystone.


You can use your Python swiftclient now to go talk, and whats better swiftclient understands the OS_* variables, so you can just source your keystone.env and talk to your cluster (to be admin) or export some new envs for the user you’ve created. If you want to use curl you can. But _much_ easier to use swiftclient.


Tip: You can use: swift auth to get the auth_token if you want to then use curl.


If you want to authenticate via curl then for v3, use:


Or for v2, I use:
auth='{"auth": {"tenantName": "demo", "passwordCredentials": {"username": "matt", "password": ""}}}'


curl -s -d "$auth" -H 'Content-type: application/json' $url |python -m json.tool



curl -s -d "$auth" -H 'Content-type: application/json' $url |python -c "import sys, json; print json.load(sys.stdin)['access']['token']['id']"

To just print out the token. Although a simple swift auth would do all this for you.

Krebs on SecurityIf Your iPhone is Stolen, These Guys May Try to iPhish You

KrebsOnSecurity recently featured the story of a Brazilian man who was peppered with phishing attacks trying to steal his Apple iCloud username and password after his wife’s phone was stolen in a brazen daylight mugging. Today, we’ll take an insider’s look at an Apple iCloud phishing gang that appears to work quite closely with organized crime rings — within the United States and beyond  — to remotely unlock and erase stolen Apple devices.

Victims of iPhone theft can use the Find My iPhone feature to remotely locate, lock or erase their iPhone — just by visiting Apple’s site and entering their iCloud username and password. Likewise, an iPhone thief can use those iCloud credentials to remotely unlock the victim’s stolen iPhone, wipe the device, and resell it. As a result, iPhone thieves often subcontract the theft of those credentials to third-party iCloud phishing services. This story is about one of those services.

The iCloud account phishing text that John's friend received months after losing a family iPhone.

The iCloud account phishing text that John’s friend received months after losing a family iPhone.

Recently, I heard from a security professional whose close friend received a targeted attempt to phish his Apple iCloud credentials. The phishing attack came several months after the friend’s child lost his phone at a public park in Virginia. The phish arrived via text message and claimed to have been sent from Apple. It said the device tied to his son’s phone number had been found, and that its precise location could be seen for the next 24 hours by clicking a link embedded in the text message.

That security professional source — referred to as “John” for simplicity’s sake — declined to be named or credited in this story because some of the actions he took to gain the knowledge presented here may run afoul of U.S. computer fraud and abuse laws.

John said his friend clicked on the link in the text message he received about his son’s missing phone and was presented with a fake iCloud login page: appleid-applemx[dot]us. A lookup on that domain indicates it is hosted on a server in Russia that is or was shared by at least 140 other domains — mostly other apparent iCloud phishing sites — such as accounticloud[dot]site; apple-appleid[dot]store; apple-devicefound[dot]org; and so on (a full list of the domains at that server is available here).

While the phishing server may be hosted in Russia, its core users appear to be in a completely different part of the world. Examining the server more closely, John noticed that it was (mis)configured in a way that leaked data about various Internet addresses that were seen recently accessing the server, as well as the names of specific directories on the server that were being accessed.

After monitoring that logging information for some time, my source discovered there were five Internet addresses that communicated with the server multiple times a day, and that those address corresponded to devices located in Argentina, Colombia, Ecuador and Mexico.

He also found a file openly accessible on the Russian server which indicated that an application running on the server was constantly sending requests to and — services that allow anyone to look up information about a mobile device by entering its unique International Mobile Equipment Identity (IMEI) number. These services return a variety of information, including the make and model of the phone, whether Find My iPhone is enabled for the device, and whether the device has been locked or reported stolen.

John said that as he was conducting additional reconnaissance of the Russian server, he tried to access “index.php” — which commonly takes one to a site’s home page — when his browser was redirected to “login.php” instead. The resulting page, pictured below, is a login page for an application called “iServer.” The login page displays a custom version of Apple’s trademarked logo as part of a pirate’s skull and crossbones motif, set against a background of bleeding orange flames.

The login page for an Apple iCloud credential phishing operation apparently used to unlock and remotely wipe stolen iPhones.

The login page for an Apple iCloud credential phishing operation apparently used to unlock and remotely wipe stolen iPhones.

John told me that in addition to serving up that login page, the server also returned the HTML contents of the “index.php” he originally requested from the server. When he saved the contents of index.php to his computer and viewed it as a text file, he noticed it inexplicably included a list of some 137 user names, email addresses and expiration dates for various users who’d apparently paid a monthly fee to access the iCloud phishing service.

“These appear to be ‘resellers’ or people that have access to the crimeware server,” my source said of the user information listed in the server’s “index.php” file.


John told KrebsOnSecurity that with very little effort he was able to guess the password of at least two other users listed in that file. After John logged into the iCloud phishing service with those credentials, the service informed him that the account he was using was expired. John was then prompted to pay for at least one more month subscription access to the server to continue.

Playing along, John said he clicked the “OK” button indicating he wished to renew his subscription, and was taken to a shopping cart hosted on the domain hostingyaa[dot]com. That payment form in turn was accepting PayPal payments for an account tied to an entity called HostingYaa LLC; viewing the HTML source on that payment page revealed the PayPal account was tied to the email address “admin@hostingyaa[dot]com.”

According to the file coughed up by the Russian server, the first username in that user list — demoniox12 — is tied to an email address and to a zero-dollar subscription to the phishing service. This strongly indicates the user in question is an administrator of this phishing service.

A review of Lanzadorx[dot]net indicates that it is a phishing-as-a-service offering that advertises the ability to launch targeted phishing attacks at a variety of free online services, including accounts at Apple, Hotmail, Gmail and Yahoo, among others.

A reverse WHOIS lookup ordered from shows that the email is linked to the registration data for exactly two domains — hostingyaa[dot]info and lanzadorx[dot]net [full disclosure: Domaintools is currently one of several advertisers on KrebsOnSecurity].

Hostingyaa[dot]info is registered to a Dario Dorrego, one of the other zero-dollar accounts included near the top of the list of users that are authorized to access the iCloud phishing service. The site says Dorrego’s account corresponds to the email address dario@hostingyaa[dot]com. That name Dario Dorrego also appears in the site registration records for 31 other Web site domains, all of which are listed here.

John said he was able to guess the passwords for at least six other accounts on the iCloud phishing service, including one particularly interesting user and possible reseller of the service who picked the username “Jonatan.” Below is a look at the home screen for Jonatan’s account on this iCloud phishing service. We can see the system indicates Jonatan was able to obtain at least 65 “hacked IDs” through this service, and that he pays USD $80 per month for access to it.

"Jonatan," a user of this iCloud account credential phishing service. Note the left side panel indicates the number of records and hacked IDs recorded for Jonatan's profile.

“Jonatan,” a user of this iCloud account credential phishing service. Note the left side panel indicates the number of records and hacked IDs recorded for Jonatan’s profile.

Here are some of the details for “Tanya,” one such victim tied to Jonatan’s account. Tanya’s personal details have been redacted from this image:

This page from the iCloud phishing service shows the redacted account details phished from an iPhone user named Tanya.

This page from the iCloud phishing service shows the redacted account details phished from an iPhone user named Tanya.

Here is the iCloud phishing page Tanya would have seen if she clicked the link sent to her via text message. Note that the victim’s full email address is automatically populated into the username portion of the login page to make the scam feel more like Apple’s actual iCloud site:


The page below from Jonatan’s profile lists each of his 60+ victims individually, detailing their name, email address, iCloud password, phone number, unique device identifier (IMEI), iPhone model/generation and some random notes apparently inserted by Jonatan:


The next screen shot shows the “SMS sent” page. It tracks which victims were sent which variation of phishing scams offered by the site; whether targets had clicked a link in the phony iCloud phishing texts; and if any of those targets ever visited the fake iCloud login pages:


Users of this phishing service can easily add a new phishing domain if their old links get cleaned up or shut down by anti-phishing and anti-spam groups. This service also advertises the ability to track when phishing links have been flagged by anti-phishing companies:


This is where the story turns both comical and ironic. Many times, attackers will test their exploit on themselves whilst failing to fully redact their personal information. Jonatan apparently tested the phishing attacks on himself using his actual Apple iCloud credentials, and this data was indexed by Jonatan’s phishing account at the fake iCloud server. In short, he phished himself and forgot to delete the successful results. Sorry, but I’ve blurred out Jonatan’s iCloud password in the screen shot here:


See if you can guess what John did next? Yes, he logged into Jonatan’s iCloud account. Helpfully, one of the screenshots in the photos saved to Jonatan’s iCloud account is of Jonatan logged into the same phishing server that leaked his iCloud account information!


The following advertisement for Jonatan’s service — also one of the images John found in Jonatan’s iCloud account — includes the prices he charges for his own remote iPhone unlocking service. It appears the pricing is adjusted upwards considerably for phishing attacks on newer model stolen iPhones. The price for phishing an iPhone 4 or 4s is $40 per message, versus $120 per message for phishing attacks aimed at iPhone 6s and 6s plus users. Presumably this is because the crooks hiring this service stand to make more money selling newer phones.


The email address that Jonatan used to register on the Apple iPhone phishing service — shown in one of the screen shots above as — also was used to register an account on Facebook tied to a Jonatan Rodriguez who says he is from Puerto Rico. It just so happens that this Jonatan Rodriguez on Facebook also uses his profile to advertise a “Remove iCloud” service. What are the odds?

Jonatan's Facebook profile page.

Jonatan’s Facebook profile page.

Well, pretty good considering this Facebook user also is the administrator of a Facebook Group called iCloud Unlock Ecuador – Worldwide. Incredibly, Facebook says there are 2,797 members of this group. Here’s what they’re all about:


Jonatan’s Facebook profile picture would have us believe that he is a male model, but the many selfies he apparently took and left in his iCloud account show a much softer side of Jonatan:

Jonatan, in a selfie he uploaded to his iCloud account, which he gave away the credentials to because the web site where his phishing service provider was hosted no virtually no security to speak of.

Jonatan, in a selfie he uploaded to his iCloud account. Jonatan unwittingly gave away the credentials to his iCloud account because the web site where his iCloud account phishing service provider was hosted had virtually no security (nor did Jonatan, apparently). Other photos in his archive include various ads for his iPhone unlocking service.

Among the members of this Facebook group is one “Alexis Cadena,” whose name appears in several of the screenshots tied to Jonatan’s account in the iCloud phishing service:


Alexis Cadena apparently also has his own iCloud phishing service. It’s not clear if he sub-lets it from Jonatan or what, but here are some of Alexis’s ads:


Coming back to Jonatan, the beauty of the iCloud service (and the lure used by Jonatan’s phishing service) is that iPhones can be located fairly accurately to a specific address. Alas, because Jonatan phished his own iCloud account, we can see that according to Jonatan’s iCloud service, his phone was seen in the following neighborhood in Ecuador on March 7, 2017. The map shows a small radius of a few blocks within Yantzaza, a town of 10,000 in southern Educador:

Jonatan's home town, according to the results of his "find my iphone" feature in iCloud.

Jonatan’s home town, according to the results of his “find my iphone” feature in iCloud.

Jonatan did not respond to multiple requests for comment.

Planet Linux AustraliaJames Morris: LSM mailing list archive: this time for sure!

Following various unresolved issues with existing mail archives for the Linux Security Modules mailing list, I’ve set up a new archive here.

It’s a mailman mirror of the vger list.


Planet DebianSean Whitton: Initial views of 5th edition DnD

I’ve been playing in a 5e campaign for around two months now. In the past ten days or so I’ve been reading various source books and Internet threads regarding the design of 5th edition. I’d like to draw some comparisons and contrasts between 5th edition, and the 3rd edition family of games (DnD 3.5e and Paizo’s Pathfinder, which may be thought of as 3.75e).

The first thing I’d like to discuss is that wizards and clerics are no longer Vancian spellcasters. In rules terms, this is the idea that individual spells are pieces of ammunition. Spellcasters have a list of individual spells stored in their heads, and as they cast spells from that list, they cross off each item. Barring special rules about spontaneously converting prepared spells to healing spells, for clerics, the only way to add items back to the list is to take a night’s rest. Contrast this with spending points from a pool of energy in order to use an ability to cast a fireball. Then the limiting factor on using spells is having enough points in your mana pool, not having further castings of the spell waiting in memory.

One of the design goals of 5th edition was to reduce the dominance of spellcasters at higher levels of play. The article to which I linked in the previous paragraph argues that this rebalancing requires the removal of Vancian magic. The idea, to the extent that I’ve understood it, is that Vancian magic is not an effective restriction on spellcaster power levels, so it is to be replaced with other restrictions—adding new restrictions while retaining the restrictions inherent in Vancian magic would leave spellcasters crippled.

A further reason for removing Vancian magic was to defeat the so-called “five minute adventuring day”. The compat ability of a party that contains higher level Vancian spellcasters drops significantly once they’ve fired off their most powerful combat spells. So adventuring groups would find themselves getting into a fight, and then immediately retreating to fully rest up in order to get their spells back. This removes interesting strategic and roleplaying possibilities involving the careful allocation of resources, and continuing to fight as hit points run low.

There are some other related changes. Spell components are no longer used up when casting a spell. So you can use one piece of bat guano for every fireball your character ever casts, instead of each casting requiring a new piece. Correspondingly, you can use a spell focus, such as a cool wand, instead of a pouch full of material components—since the pouch never runs out, there’s no mechanical change if a wizard uses an arcane focus instead. 0th level spells may now be cast at will (although Pathfinder had this too). And there are decent 0th level attack spells, so a spellcaster need not carry a crossbow or shortbow in order to have something to do on rounds when it would not be optimal to fire off one of their precious spells.

I am very much in favour of these design goals. The five minute adventuring day gets old fast, and I want it to be possible for the party to rely on the cool abilities of non-spellcasters to deal with the challenges they face. However, I am concerned about the flavour changes that result from the removal of Vancian magic. These affect wizards and clerics differently, so I’ll take each case in turn.

Firstly, consider wizards. In third edition, a wizard had to prepare and cast Read Magic (the only spell they could prepare without a spellbook), and then set about working through their spellbook. This involved casting the spells they wanted to prepare, up until the last few triggering words or gestures that would cause the effect of the spell to manifest. They would commit these final parts of the spell to memory. When it came to casting the spell, the wizard would say the final few words and make the required gestures, and bring out relevant material components from their component pouch. The completed spell would be ripped out of their mind, to manifest its effect in the world. We see that the casting of a spell is a highly mentally-draining activity—it rips the spell out of the caster’s memory!—not to be undertaken lightly. Thus it is natural that a wizard would learn to use a crossbow for basic damage-dealing. Magic is not something that comes very naturally to the wizard, to be deployed in combat as readily as the fighter swings their sword. They are not a superhero or video game character, “pew pew”ing their way to victory. This is a very cool starting point upon which to roleplay an academic spellcaster, not really available outside of tabletop games. I see it as a distinction between magical abilities and real magic.

Secondly, consider clerics. Most of the remarks in the previous paragraph apply, suitably reworked to be in terms of requesting certain abilities from the deity to whom the cleric is devoted. Additionally, there is the downgrading of the importance of the cleric’s healing magic in 5th edition. Characters can heal themselves by taking short and long rests. Previously, natural healing was very slow, so a cleric would need to convert all their remaining magic to healing spells at the end of the day, and hope that it was enough to bring the party up to fighting shape. Again, this made the party of adventurers seem less like superheroes or video game characters. Magic had a special, important and unique role, that couldn’t be replaced by the abilities of other classes.

There are some rules in the back of the DMG—“Slow Natural Healing”, “Healing Kit Dependency”, “Lingering Wounds”—which can be used to make healing magic more important. I’m not sure how well they would work without changes to the cleric class.

I would like to find ways to restore the feel and flavour of Vancian clerics and wizards to 5th edition, without sacrificing the improvements that have been made that let other party members do cool stuff too. I hope it is possible to keep magic cool and unique without making it dominate the game. It would be easy to forbid the use of arcane foci, and say that material component pouches run out if the party do not visit a suitable marketplace often enough. This would not have a significant mechanical effect, and could enhance roleplaying possibilities. I am not sure how I could deal with the other issues I’ve discussed without breaking the game.

The second thing I would like to discuss is bounded accuracy. Under this design principle, the modifiers to dice rolls grow much more slowly. The gain of hit points remains unbounded. Under third edition, it was mechanically impossible for a low-level monster to land a hit on a higher-level adventurer, rendering them totally useless even in overwhelming numbers. With bounded accuracy, it’s always possible for a low-level monster to hit a PC, even if they do insigificant damage. That means that multiple low-level monsters pose a threat.

This change opens up many roleplaying opportunities by keeping low-level character abilities relevant, as well as monster types that can remain involves in stories without giving them implausible new abilities so they don’t fall far behind the PCs. However, I’m a little worried that it might make high level player characters feel a lot less powerful to play. I want to cease a be a fragile adventurer and become a world-changing hero at later levels, rather than forever remain vulnerable to the things that I was vulnerable to at the start of the game. This desire might just be the result of the video games which I played growing up. In the JRPGs I played and in Diablo II, enemies in earlier areas of the map were no threat at all once you’d levelled up by conquering higher-level areas. My concerns about bounded accuracy might just be that it clashes with my own expectations of how fantasy heroes work. A good DM might be able to avoid these worries entirely.

The final thing I’d like to discuss is the various simplifications to the rules of 5th edition, when it is compared with 3rd edition and Pathfinder. Attacks of opportunity are only provoked when leaving a threatened square; you can go ahead and cast a spell when in melee with someone. There is a very short list of skills, and party members are much closer to each other in skills, now that you can’t pump more and more ranks into one or two abilities. Feats as a whole are an optional rule.

At first I was worried about these simplifications. I thought that they might make character building and tactics in combat a lot less fun. However, I am now broadly in favour of all of these changes, for two reasons. Firstly, they make the game so much more accessible, and make it far more viable to play without relying on a computer program to fill in the boxes on your character sheet. In my 5th edition group, two of us have played 3rd edition games, and the other four have never played any tabletop games before. But nobody has any problems figuring out their modifiers because it is always simply your ability bonus or penalty, plus your proficiency bonus if relevant. And advantage and disadvantage is so much more fun than getting an additional plus or minus two. Secondly, these simplifications downplay the importance of the maths, which means it is far less likely to be broken. It is easier to ensure that a smaller core of rules is balanced than it is to keep in check a larger mass of rules, constantly being supplemented by more and more addon books containing more and more feats and prestige classes. That means that players make their characters cool by roleplaying them in interesting ways, not making them cool by coming up with ability combos and synergies in advance of actually sitting down to play. Similarly, DMs can focus on flavouring monsters, rather than writing up longer stat blocks.

I think that this last point reflects what I find most worthwhile about tabletop RPGs. I like characters to encounter cool NPCs and cool situations, and then react in cool ways. I don’t care that much about character creation. (I used to care more about this, but I think it was mainly because of interesting options for magic items, which hasn’t gone away.) The most important thing is exercising group creativity while actually playing the game, rather than players and DMs having to spend a lot of time preparing the maths in advance of playing. Fifth edition enables this by preventing the rules from getting in the way, because they’re broken or overly complex. I think this is why I love Exalted: stunting is vital, and there is social combat. I hope to be able to work out a way to restore Vancian magic, but even without that, on balance, fifth edition seems like a better way to do group storytelling about fantasy heroes. Hopefully I will have an opportunity to DM a 5th edition campaign. I am considering disallowing all homebrew and classes and races from supplemental books. Stick to the well-balanced core rules, and do everything else by means of roleplaying and flavour. This is far less gimmicky, if more work for unimaginative players (such as myself!).

Some further interesting reading:

CryptogramThe CIA's "Development Tradecraft DOs and DON'Ts"

Useful best practices for malware writers, courtesy of the CIA. Seems like a lot of good advice.


  • DO obfuscate or encrypt all strings and configuration data that directly relate to tool functionality. Consideration should be made to also only de-obfuscating strings in-memory at the moment the data is needed. When a previously de-obfuscated value is no longer needed, it should be wiped from memory.

    Rationale: String data and/or configuration data is very useful to analysts and reverse-engineers.

  • DO NOT decrypt or de-obfuscate all string data or configuration data immediately upon execution.

    Rationale: Raises the difficulty for automated dynamic analysis of the binary to find sensitive data.

  • DO explicitly remove sensitive data (encryption keys, raw collection data, shellcode, uploaded modules, etc) from memory as soon as the data is no longer needed in plain-text form. DO NOT RELY ON THE OPERATING SYSTEM TO DO THIS UPON TERMINATION OF EXECUTION.

    Rationale: Raises the difficulty for incident response and forensics review.

  • DO utilize a deployment-time unique key for obfuscation/de-obfuscation of sensitive strings and configuration data.

    Rationale: Raises the difficulty of analysis of multiple deployments of the same tool.

  • DO strip all debug symbol information, manifests(MSVC artifact), build paths, developer usernames from the final build of a binary.

    Rationale: Raises the difficulty for analysis and reverse-engineering, and removes artifacts used for attribution/origination.

  • DO strip all debugging output (e.g. calls to printf(), OutputDebugString(), etc) from the final build of a tool.

    Rationale: Raises the difficulty for analysis and reverse-engineering.

  • DO NOT explicitly import/call functions that is not consistent with a tool's overt functionality (i.e. WriteProcessMemory, VirtualAlloc, CreateRemoteThread, etc - for binary that is supposed to be a notepad replacement).

    Rationale: Lowers potential scrutiny of binary and slightly raises the difficulty for static analysis and reverse-engineering.

  • DO NOT export sensitive function names; if having exports are required for the binary, utilize an ordinal or a benign function name.

    Rationale: Raises the difficulty for analysis and reverse-engineering.

  • DO NOT generate crashdump files, coredump files, "Blue" screens, Dr Watson or other dialog pop-ups and/or other artifacts in the event of a program crash. DO attempt to force a program crash during unit testing in order to properly verify this.

    Rationale: Avoids suspicion by the end user and system admins, and raises the difficulty for incident response and reverse-engineering.

  • DO NOT perform operations that will cause the target computer to be unresponsive to the user (e.g. CPU spikes, screen flashes, screen "freezing", etc).

    Rationale: Avoids unwanted attention from the user or system administrator to tool's existence and behavior.

  • DO make all reasonable efforts to minimize binary file size for all binaries that will be uploaded to a remote target (without the use of packers or compression). Ideal binary file sizes should be under 150KB for a fully featured tool.

    Rationale: Shortens overall "time on air" not only to get the tool on target, but to time to execute functionality and clean-up.

  • DO provide a means to completely "uninstall"/"remove" implants, function hooks, injected threads, dropped files, registry keys, services, forked processes, etc whenever possible. Explicitly document (even if the documentation is "There is no uninstall for this ") the procedures, permissions required and side effects of removal.

    Rationale: Avoids unwanted data left on target. Also, proper documentation allows operators to make better operational risk assessment and fully understand the implications of using a tool or specific feature of a tool.

  • DO NOT leave dates/times such as compile timestamps, linker timestamps, build times, access times, etc. that correlate to general US core working hours (i.e. 8am-6pm Eastern time)

    Rationale: Avoids direct correlation to origination in the United States.

  • DO NOT leave data in a binary file that demonstrates CIA, USG, or its witting partner companies involvement in the creation or use of the binary/tool.

    Rationale: Attribution of binary/tool/etc by an adversary can cause irreversible impacts to past, present and future USG operations and equities.

  • DO NOT have data that contains CIA and USG cover terms, compartments, operation code names or other CIA and USG specific terminology in the binary.

    Rationale: Attribution of binary/tool/etc by an adversary can cause irreversible impacts to past, present and future USG operations and equities.

  • DO NOT have "dirty words" (see dirty word list - TBD) in the binary.

    Rationale: Dirty words, such as hacker terms, may cause unwarranted scrutiny of the binary file in question.


  • DO use end-to-end encryption for all network communications. NEVER use networking protocols which break the end-to-end principle with respect to encryption of payloads.

    Rationale: Stifles network traffic analysis and avoids exposing operational/collection data.

  • DO NOT solely rely on SSL/TLS to secure data in transit.

    Rationale: Numerous man-in-middle attack vectors and publicly disclosed flaws in the protocol.

  • DO NOT allow network traffic, such as C2 packets, to be re-playable.

    Rationale: Protects the integrity of operational equities.

  • DO use ITEF RFC compliant network protocols as a blending layer. The actual data, which must be encrypted in transit across the network, should be tunneled through a well known and standardized protocol (e.g. HTTPS)

    Rationale: Custom protocols can stand-out to network analysts and IDS filters.

  • DO NOT break compliance of an RFC protocol that is being used as a blending layer. (i.e. Wireshark should not flag the traffic as being broken or mangled)

    Rationale: Broken network protocols can easily stand-out in IDS filters and network analysis.

  • DO use variable size and timing (aka jitter) of beacons/network communications. DO NOT predicatively send packets with a fixed size and timing.

    Rationale: Raises the difficulty of network analysis and correlation of network activity.

  • DO proper cleanup of network connections. DO NOT leave around stale network connections.

    Rationale: Raises the difficulty of network analysis and incident response.

Disk I/O:

  • DO explicitly document the "disk forensic footprint" that could be potentially created by various features of a binary/tool on a remote target.

    Rationale: Enables better operational risk assessments with knowledge of potential file system forensic artifacts.

  • DO NOT read, write and/or cache data to disk unnecessarily. Be cognizant of 3rd party code that may implicitly write/cache data to disk.

    Rationale: Lowers potential for forensic artifacts and potential signatures.

  • DO NOT write plain-text collection data to disk.

    Rationale: Raises difficulty of incident response and forensic analysis.

  • DO encrypt all data written to disk.

    Rationale: Disguises intent of file (collection, sensitive code, etc) and raises difficulty of forensic analysis and incident response.

  • DO utilize a secure erase when removing a file from disk that wipes at a minimum the file's filename, datetime stamps (create, modify and access) and its content. (Note: The definition of "secure erase" varies from filesystem to filesystem, but at least a single pass of zeros of the data should be performed. The emphasis here is on removing all filesystem artifacts that could be useful during forensic analysis)

    Rationale: Raises difficulty of incident response and forensic analysis.

  • DO NOT perform Disk I/O operations that will cause the system to become unresponsive to the user or alerting to a System Administrator.

    Rationale: Avoids unwanted attention from the user or system administrator to tool's existence and behavior.

  • DO NOT use a "magic header/footer" for encrypted files written to disk. All encrypted files should be completely opaque data files.

    Rationale: Avoids signature of custom file format's magic values.

  • DO NOT use hard-coded filenames or filepaths when writing files to disk. This must be configurable at deployment time by the operator.

    Rationale: Allows operator to choose the proper filename that fits with in the operational target.

  • DO have a configurable maximum size limit and/or output file count for writing encrypted output files.

    Rationale: Avoids situations where a collection task can get out of control and fills the target's disk; which will draw unwanted attention to the tool and/or the operation.


  • DO use GMT/UTC/Zulu as the time zone when comparing date/time.

    Rationale: Provides consistent behavior and helps ensure "triggers/beacons/etc" fire when expected.

  • DO NOT use US-centric timestamp formats such as MM-DD-YYYY. YYYYMMDD is generally preferred.

    Rationale: Maintains consistency across tools, and avoids associations with the United States.


  • DO NOT assume a "free" PSP product is the same as a "retail" copy. Test on all SKUs where possible.

    Rationale: While the PSP/AV product may come from the same vendor and appear to have the same features despite having different SKUs, they are not. Test on all SKUs where possible.

  • DO test PSPs with live (or recently live) internet connection where possible. NOTE: This can be a risk vs gain balance that requires careful consideration and should not be haphazardly done with in-development software. It is well known that PSP/AV products with a live internet connection can and do upload samples software based varying criteria.

    Rationale: PSP/AV products exhibit significant differences in behavior and detection when connected to the internet vise not.

Encryption: NOD publishes a Cryptography standard: "NOD Cryptographic Requirements v1.1 TOP SECRET.pdf". Besides the guidance provided here, the requirements in that document should also be met.

The crypto requirements are complex and interesting. I'll save commenting on them for another post.

News article.

TEDThese TED2017 speakers’ talks will be broadcast live to cinemas April 24 and 25

The speaker lineup for the TED2017 conference features more than 70 thinkers and doers from around the world — including a dozen or so whose unfiltered TED Talks will be broadcast live to movie theater audiences across the U.S. and Canada.

Presented with our partner BY Experience, our TED Cinema Experience event series offers three opportunities for audiences to join together and experience the TED2017 Conference, and its first two evenings feature live TED Talks. Below: find out who’s part of the live cinema broadcast here (as with any live event, the speaker lineup is subject to change, of course!).

The listing below reflects U.S. and Canadian times; international audiences in 18 countries will experience TED captured live and time-shifted. Check locations and show times, and purchase tickets here >>

Opening Night Event: Monday, April 24, 2017
US: 8pm ET/ 7pm CT/ 6pm MT/ time-shifted to 8pm PT
Experience the electric opening night of TED, with half a dozen TED Talks and performances from:
Designer Anab Jain
Cyberspace analyst Laura Galante
Artist Titus Kaphar
Grandmaster and analyst Garry Kasparov
Author Tim Ferriss
The band OK Go
Rabbi Lord Jonathan Sacks

TED Prize Event: Tuesday, April 25, 2017
US: 8pm ET/ 7pm CT/ 6pm MT/ time-shifted to 8pm PT
On the second night of TED2017, the TED Prize screening offers a lineup of awe-inspiring speakers with big ideas for our future, including:
Champion Serena Williams
Physician and writer Atul Gawande
Data genius Anna Rosling Rönnlund
Movement artists Jon Boogz + Lil Buck
TED Prize winner Raj Panjabi, who will reveal for the first time plans to use his $1 million TED Prize to fund a creative, bold wish to spark global change.

TED‘Armchair archaeologists’ search 5 million tiles of Peru

Morning fog and clouds reveal Machu Picchu, the ancient lost city of the Incas, one of Peru's top tourist destinations. Photo Credit: Design Pics Inc/National Geographic Creative

Morning clouds reveal Machu Picchu, ancient city of the Incas. Peru is home to many archaeological sites — and citizen scientists are mapping the country with GlobalXplorer. Photo: Design Pics Inc./National Geographic Creative

GlobalXplorer, the citizen science platform for archaeology, launched two weeks ago. It’s the culmination of Sarah Parcak’s TED Prize wish and, already, more than 32,000 curious minds from around the world have started their training, learning to spot signs of ancient sites threatened by looters. Working together, the GlobalXplorer community has just finished searching the 5 millionth tile in Peru, the first country the platform is mapping.

“I’m thrilled,” said Parcak. “I had no idea we’d complete this many tiles so soon.”

“Expedition Peru” has users searching more than 250,000 square kilometers of highlands and desert, captured in high-resolution satellite imagery provided by DigitalGlobe. This large search area has been divided into 20 million tiles, each about the size of a few city blocks. Users look at tiles one at a time, and mark whether they see anything in the image that could be a looting pit. When 5–6 users flag a site as containing potential looting, Parcak’s team will step in to study it in more detail. “So far, the community has flagged numerous potential looting sites,” said Parcak. “We’ll be taking a look at each one and further investigating.”

GlobalXplorer volunteers are searching Peru, one tile at a time, looking for signs of looting. Each tile shows an area the size of a few city blocks. Photo: Courtesy of GlobalXplorer

GlobalXplorer volunteers are searching Peru, one tile at a time, looking for signs of looting. Each tile shows an area the size of a few city blocks. Photo: Courtesy of GlobalXplorer

When GlobalXplorer launched, The Guardian described its users as “armchair archaeologists.” As this growing community searches for signs of looting, it’s unlocking articles and videos from National Geographic’s archives that give greater context to the expedition. So far, four chapters are available — including one on the explorers whose work has shed light on the mysteries of Peru, and one on the Chavín culture known for its psychedelic religious rituals.

“Everyone will find things on GlobalXplorer,” said Parcak. “All users are making a real difference. I’ve had photos from my friends showing their kids working together to find sites, and emails from retirees who always wanted to be archaeologists but never could. It’s really heartwarming to see this work.”

Expedition Peru draws to a close on March 15, 2017. Start searching »

CryptogramDefense against Doxing

A decade ago, I wrote about the death of ephemeral conversation. As computers were becoming ubiquitous, some unintended changes happened, too. Before computers, what we said disappeared once we'd said it. Neither face-to-face conversations nor telephone conversations were routinely recorded. A permanent communication was something different and special; we called it correspondence.

The Internet changed this. We now chat by text message and e-mail, on Facebook and on Instagram. These conversations -- with friends, lovers, colleagues, fellow employees -- all leave electronic trails. And while we know this intellectually, we haven't truly internalized it. We still think of conversation as ephemeral, forgetting that we're being recorded and what we say has the permanence of correspondence.

That our data is used by large companies for psychological manipulation ­-- we call this advertising --­ is well known. So is its use by governments for law enforcement and, depending on the country, social control. What made the news over the past year were demonstrations of how vulnerable all of this data is to hackers and the effects of having it hacked, copied, and then published online. We call this doxing.

Doxing isn't new, but it has become more common. It's been perpetrated against corporations, law firms, individuals, the NSA and -- just this week -- the CIA. It's largely harassment and not whistleblowing, and it's not going to change anytime soon. The data in your computer and in the cloud are, and will continue to be, vulnerable to hacking and publishing online. Depending on your prominence and the details of this data, you may need some new strategies to secure your private life.

There are two basic ways hackers can get at your e-mail and private documents. One way is to guess your password. That's how hackers got their hands on personal photos of celebrities from iCloud in 2014.

How to protect yourself from this attack is pretty obvious. First, don't choose a guessable password. This is more than not using "password1" or "qwerty"; most easily memorizable passwords are guessable. My advice is to generate passwords you have to remember by using either the XKCD scheme or the Schneier scheme, and to use large random passwords stored in a password manager for everything else.

Second, turn on two-factor authentication where you can, like Google's 2-Step Verification. This adds another step besides just entering a password, such as having to type in a one-time code that's sent to your mobile phone. And third, don't reuse the same password on any sites you actually care about.

You're not done, though. Hackers have accessed accounts by exploiting the "secret question" feature and resetting the password. That was how Sarah Palin's e-mail account was hacked in 2008. The problem with secret questions is that they're not very secret and not very random. My advice is to refuse to use those features. Type randomness into your keyboard, or choose a really random answer and store it in your password manager.

Finally, you also have to stay alert to phishing attacks, where a hacker sends you an enticing e-mail with a link that sends you to a web page that looks almost like the expected page, but which actually isn't. This sort of thing can bypass two-factor authentication, and is almost certainly what tricked John Podesta and Colin Powell.

The other way hackers can get at your personal stuff is by breaking in to the computers the information is stored on. This is how the Russians got into the Democratic National Committee's network and how a lone hacker got into the Panamanian law firm Mossack Fonseca. Sometimes individuals are targeted, as when China hacked Google in 2010 to access the e-mail accounts of human rights activists. Sometimes the whole network is the target, and individuals are inadvertent victims, as when thousands of Sony employees had their e-mails published by North Korea in 2014.

Protecting yourself is difficult, because it often doesn't matter what you do. If your e-mail is stored with a service provider in the cloud, what matters is the security of that network and that provider. Most users have no control over that part of the system. The only way to truly protect yourself is to not keep your data in the cloud where someone could get to it. This is hard. We like the fact that all of our e-mail is stored on a server somewhere and that we can instantly search it. But that convenience comes with risk. Consider deleting old e-mail, or at least downloading it and storing it offline on a portable hard drive. In fact, storing data offline is one of the best things you can do to protect it from being hacked and exposed. If it's on your computer, what matters is the security of your operating system and network, not the security of your service provider.

Consider this for files on your own computer. The more things you can move offline, the safer you'll be.

E-mail, no matter how you store it, is vulnerable. If you're worried about your conversations becoming public, think about an encrypted chat program instead, such as Signal, WhatsApp or Off-the-Record Messaging. Consider using communications systems that don't save everything by default.

None of this is perfect, of course. Portable hard drives are vulnerable when you connect them to your computer. There are ways to jump air gaps and access data on computers not connected to the Internet. Communications and data files you delete might still exist in backup systems somewhere -- either yours or those of the various cloud providers you're using. And always remember that there's always another copy of any of your conversations stored with the person you're conversing with. Even with these caveats, though, these measures will make a big difference.

When secrecy is truly paramount, go back to communications systems that are still ephemeral. Pick up the telephone and talk. Meet face to face. We don't yet live in a world where everything is recorded and everything is saved, although that era is coming. Enjoy the last vestiges of ephemeral conversation while you still can.

This essay originally appeared in the Washington Post.

Planet Linux AustraliaSridhar Dhanapalan: How to Create a Venn Diagram with Independent Intersections in PowerPoint

A Venn diagram can be a great way to explain a business concept. This is generally not difficult to create in modern presentation software. I often use Google Slides for its collaboration abilities.

Where it becomes difficult is when you want to add a unique colour/pattern to an intersection, where the circles overlap. Generally you will either get one circle overlapping another, or if you set some transparency then the intersection will become a blend of the colours of the circles.

I could not work out how to do this in Google Slides, so on this occasion I cheated and did it in Microsoft PowerPoint instead. I then imported the resulting slide into Slides.

This worked for me in PowerPoint for Mac 2016. The process is probably the same on Windows.

Firstly, create a SmartArt Venn Diagram

Insert > SmartArt > Relationship > Basic Venn


Separate the Venn circles

SmartArt Design > Convert > Convert to Shapes


Ungroup shapes

Shape Format > Group Objects > Ungroup


Split out the intersections

Shape Format > Merge Shapes > Fragment


From there, you can select the intersection as an independent shape. You can treat each piece separately. Try giving them different colours or even moving them apart.


This can be a simple but impactful way to get your point across.

CryptogramFBI's Exploit Against Tor

The Department of Justice is dropping all charges in a child-porn case rather than release the details of a hack against Tor.

Sociological ImagesMasculinity and Fidelity in Pop Music

Originally posted at the Gender & Society blog.

Two songs that seemed like they were on the radio every time I tuned into a pop station last summer were Omi’s single, “Cheerleader” (originally released in 2015) and Andy Grammar’s song, “Honey, I’m good” (originally released in 2014). They’re both songs written for mass consumption. Between 2014 and 2015, “Cheerleader” topped the charts in over 20 countries around the world. And, while “Honey, I’m Good” had less mass appeal, it similarly found its way onto top hit lists around the world.

They’re different genres of music. But they both fall under the increasingly meaningless category of “pop.”  And, because they both gained popularity around the same time, it was possible to hear them back to back on radio stations across the U.S.  Both songs are about the same issue: each are ballads sung by men celebrating themselves for being faithful in their heterosexual relationships.  Below is Omi’s “Cheerleader.” Here is the chorus:

“All these other girls are tempting / But I’m empty when you’re gone / And they say / Do you need me? / Do you think I’m pretty? / Do I make you feel like cheating? / And I’m like no, not really cause / Oh I think that I found myself a cheerleader / She is always right there when I need her / Oh I think that I found myself a cheerleader / She is always right there when I need her”

In Omi’s song, he situates himself as uninterested in cheating because he’s found a woman who believes in him more than he does. And this, he suggests, is worth his fidelity. Though, he does admit to being tempted, which also works to situate him as laudable because he “has options.”

Andy Grammar’s song is a different genre. And like Omi’s song, it’s catchy (though, apparently less catchy if pop charts are a good measure). Grammar’s video is dramatically different as well. It’s full of couples lip syncing his song while claiming amounts of time they’ve been faithful to one another. Again, and for comparison, below is the chorus:

“Nah nah, honey I’m good / I could have another but I probably should not / I’ve got somebody at home, and if I stay I might not leave alone / No, honey I’m good, I could have another but I probably should not / I’ve gotta bid you adieu and to another I will stay true”

Unlike Omi’s song, Grammar’s single is a song about a man at a bar without his significant other. He’s turning down drinks from a woman (or women), claiming that he doesn’t trust himself to be faithful if he gives into the drink. Instead, he opts to leave the bar to ensure he doesn’t give in to this temptation.

Both songs are written in the same spirit. They’re songs that appear to be about women, but are actually anthems about what amazing men these guys are because… well, because they don’t cheat, but could.

I was struck by the common message, a message at least partially to blame for why we all heard them so much. And the message is that, for men in heterosexual relationships, resisting the temptation to be unfaithful is hard work. And this message helps to highlight key ingredients of contemporary hegemonic masculinities: heterosexuality and promiscuity. Both men are identifying as heterosexual throughout each song. But, you might think, they’re not identifying as promiscuous. So, how are they supporting this cultural ideal if they appear to be challenging it? The answer to that is all in the delivery.

Amy C. Wilkins studied the ways that a group of college Christian men navigated what she terms the “masculinity dilemma” of demonstrating themselves to be heterosexual and heterosexually active when they were in a group committed to abstinence. Wilkins discovered that they navigated this dilemma by enacting what she refers to as “collective processes of temptation” whereby they crafted a discourse about just how masculine they were by resisting the temptation to be heterosexually active. They ritualistically discussed the problem of heterosexual temptation. And, in so doing, Wilkins argues that the men she studied, “perform their heterosexuality collectively, aligning themselves with conventional assumptions about masculinity through the ritual invocation of temptation” (here: 353). It’s hard to craft an identity based on not doing something. But if you’re going to, Wilkins argues that temptation is key.

Similarly, Sarah Diefendorf found that young evangelical Christian men navigate their gender identities alongside pledges of sexual abstinence until marriage. Men in Diefendorf’s study used one another as “accountability partners” to make sure they didn’t cheat on their pledges if they were in relationships, but even with things like pornography or masturbation. As Diefendorf writes, “These confessions… enable these men to demonstrate a connection with hegemonic masculinity through claims of desire for future heterosexual practices” (here: 658-659). In C.J. Pascoe’s study of high school boys navigating tenuous gender and sexual identities, she refers to this process more generally as “compulsive heterosexuality.”

Both songs are meant to situate the two singers as great men, men to be admired. But, being able to listen to this message and “get it” means that you can take for granted the premise on which the songs are based—in this case, that men are hard-wired to be sexual scoundrels and that heterosexual women should count themselves lucky if they are fortunate enough to have landed a man committed to not living up to his wiring. Without understanding men as having a natural and apparently insatiable sexual wanderlust, these songs don’t make sense.

Both Omi and Grammar need the discourse of temptation to frame themselves as noble. If we want to challenge men to not cheat, we should be challenge the idea that they’re working against biologically deterministic inclinations to do so. I’m not sure it would make a top 20 hit, but neither would it recuperate forms of gendered inequality through the guise of dismantling them.


*Thanks to Sarah Diefendorf for her edits and smart feedback on this post.

Tristan Bridges, PhD is a professor at The College at Brockport, SUNY. He is the co-editor of Exploring Masculinities: Identity, Inequality, Inequality, and Change with C.J. Pascoe and studies gender and sexual identity and inequality. You can follow him on Twitter here. Tristan also blogs regularly at Inequality by (Interior) Design.

(View original at

Worse Than FailureCodeSOD: Still Empty

A few months ago, Hannes shared with us some recycled code. In that installment, we learned that one of his team mates has… issues with strings. And naming things. And basic language features.

These issues continue.

For example, imagine if you well that you have a file. This file contains certain characters which you want to remove. Some need to be converted into other forms, like the letter “G” should become “6”, and “B” should become “8”. Think through how you might solve this.

private bool CleanFile(string FileName)
        bool result = false;
                string s1 = string.Empty, s2 = string.Empty;
                FileStream aFile = new FileStream(FileName, FileMode.Open);
                using (StreamReader sr = new StreamReader(aFile, System.Text.Encoding.Default))
                        while (sr.Peek() > -1)
                                s1 += sr.ReadLine();

                        s2 = s1.Replace("INS", Environment.NewLine + "INS");
                        s1 = s2.Replace("␟", "");
                        s2 = s1.Replace("-", "");
                        s1 = s2.Replace("_", "");
                        s2 = s1.Replace("G", "6");
                        s1 = s2.Replace("B", "8");
                        s2 = s1.Replace(" ", "");
                        s1 = s2.Replace("^", "");
                        s2 = s1.Replace("•", "");
                        s1 = s2.Replace(":", "");
                        s2 = s1.Replace("<", "");
                        s1 = s2.Replace(">", "");
                        s2 = s1.Replace(".", "");
                        s1 = s2.Replace("£", "");
                        s2 = s1.Replace("/", "");
                        s1 = s2.Replace("(", "");
                        s2 = s1.Replace(")", "");
                        s1 = s2.Replace("A", "");
                        s2 = s1.Replace("?", "");// still empty - can be used
                using (StreamWriter sw = new StreamWriter(FileName))
                result = true;
        catch (Exception ex)
                result = false;
        return result;

I’m going to ignore the pair of alternating elephant in the room for a moment, because I want to pick on oone of those little pet-peeves: the exception handling. A generic exception handler is bad. An exception handler that does nothing but show the error in a message box is also bad. Especially in a private method.

We could puzzle over the use of both s1 and s2, we could roll our eyes at doing a close inside of a using block (which automatically does that for you), but look at this line, right here:

                        s2 = s1.Replace("?", "");// still empty - can be used

What does that mean? What could it mean? Why is it there? This sounds like something that should be featured on Screenshots of Despair. Still empty - can be used. It’s like a mantra. The left side and the right side of my brain keep trading it back and forth, like s1 = s2.Replace(…) but nothing’s being replaced.

I’m still empty. I can be used.

[Advertisement] Manage IT infrastructure as code across all environments with Puppet. Puppet Enterprise now offers more control and insight, with role-based access control, activity logging and all-new Puppet Apps. Start your free trial today!

Planet Linux AustraliaOpenSTEM: This Week in HASS – term 1, week 7

This week our youngest students are looking in depth at different types of celebrations; slightly older students are examining how people got around in the ‘Olden Days’; and our older primary students have some extra time to finish their activities from last week.

Foundation to Year 3

First car made in Qld, 1902

In the stand-alone Foundation (Prep) unit (F.1), students are discussing celebrations – which ones do we recognise in Australia, how these compare with celebrations overseas, and what were these celebrations like in days gone by. Our integrated Foundation (Prep) unit (F.5) and students in Years 1 (1.1), 2 (2.1) and 3 (3.1), are examining Transport in the Past – how did their grandparents get around? How did people get around 100 years ago? How did kids get to school? How did people do the shopping? Students even get to dream about how we might get around in the future…

Years 3 to 6

Making mud bricks

At OpenSTEM we recognise that good activities, which engage students and allow for real learning, take time. Nobody likes to get really excited about something and then be rushed through it and quickly moved on to something else. This part of the unit has lots of hands-on activities for Year 3 (3.5) students in an integrated class with Year 4, as well as Year 4 (4.1), 5 (5.1) and 6 (6.1) students. In recognition of that, two weeks are allowed for the students to really get into making Ice Ages and mud bricks, and working out how to survive the challenges of living in a Neolithic village – including how to trade, count and write. Having enough time allows for consolidation of learning, as well as allowing teachers to potentially split the class into different groups engaged in different activities, and then rotate the groups through the activities over a 2 week period.


Chaotic IdealismQ&A: Why do people self-diagnose?

Q: Why do people diagnose themselves online with Asperger's syndrome, instead of getting a valid assessment? Isn’t this unethical? I see people all the time diagnosing themselves as “aspies”, when Autism Spectrum Disorder is a real disability. Why do people want to do so?

A: They do this because they cannot get a valid assessment. Some lack access to mental health care, especially in the United States. Some are being asked to pay such high fees that they would have to choose between an assessment and their monthly food budget. Some know that they could lose their jobs or be denied child custody if they were officially diagnosed. They are doing the best they can with the information they have.

When you lack medical care, have just fallen down the stairs, and see that your arm is bent the wrong way, it’s not unethical to diagnose yourself with a broken arm and splint it as best you can. What’s unethical is that somebody has denied you the right to see a doctor, or made broken arms so stigmatized that you fear anyone knowing you’ve broken yours.

ASD is a real disability, and most of those who self-diagnose have a real disability. Or do you really think they would identify themselves as having ASD if there were no significant impairment for them to worry about, no social confusion or communication impairment or sensory processing disorder? Sure, some of them are wrong about what exactly’s going on; maybe they say it’s autism when with a bit more study they’d realize it better matches ADHD, or social communication disorder, or schizoid personality disorder. But then, some professionally diagnosed with autism are also misdiagnosed. Considering that the professionals get all the obvious cases because their parents bring them in, I think self-diagnosed people are doing pretty well when it comes to accuracy.

If you’re upset about this, then start advocating for universal access to mental health care and the removal of stigma around autism and disability in general. Until you’ve done that, don’t disparage those who self-diagnose.

Planet Linux AustraliaBinh Nguyen: Prophets/Genesis/Terraforming Mars, Seek Menu, and More

Obvious continuation from my previous other posts with regards to prophets/pre-cogs:


TEDA new map of the Peruvian Amazon, the race to explore the deep ocean, and a rock album reimagined

As usual, the TED community has lots of news to share this week. Below, some highlights.

A map to guide conservation. After almost eight years of airborne laser-guided imaging spectroscopy, Greg Asner has finally mapped all 300,000 square miles of the Peruvian Amazon. Highlighting forest types that are reasonably safe and those which are in danger, Asner’s map offers conservationists a strategic way to apply future efforts of protection, though not all scientists remained convinced of its current benefits. For now, however, Asner remains committed to his approach, with current plans to modify his technology for eventual orbit. “[Once in orbit], we can map the changing biodiversity of the planet every month. That’s what we need to manage our extinction crisis.” (Watch Greg’s TED Talk)

A tree-like pavilion for London. Architect Francis Kéré, a Burkina Faso native known for his use of local building materials like clay, will construct the 2017 Serpentine Pavilion in London, the first African to do so. Kéré’s inspiration for the pavilion’s design is a tree, which he describes as the most important place in his village because it is where people gathered as a community. Each year, the Serpentine Galleries commission a leading architect to build a temporary summer pavilion; previous architects include fellow TEDsters Bjarke Ingels and Frank Gehry. (Watch Francis’ TED Talk, Bjarke’s TED Talk, and Frank’s TED Talk)

The ICIJ goes independent. Less than a year after publishing the largest investigation in journalism history, known as the Panama Papers, the International Consortium of Investigative Journalists (ICIJ) announced in February that they were breaking away from the Center for Public Integrity, which founded ICIJ in 1997. Under the continued leadership of Gerard Ryle, ICIJ will become a fully independent nonprofit news organization. (Watch Gerard’s TED Talk)

A virtual forest aids a real one. Under the direction of Honor Harger, Singapore’s ArtScience Museum launched an interactive exhibit dedicated to rainforest conservation in Southeast Asia. The show, titled Into the Wild: An Immersive Virtual Adventure, creates over 1,000 square meters of virtual rainforest in the museum’s public spaces, which users can explore with their smartphones. The exhibit features a parallel with reality: for every virtual tree planted (and accompanied by a pledge to WWF), a real tree will be planted in a rainforest in Indonesia. (Watch Honor’s TED Talk)

New inductees in the Women’s Hall of Fame. Autism and livestock advocate Temple Grandin and actor Aimee Mullins are two of the ten women selected to be inducted into the National Women’s Hall of Fame 2017 class. The group will meet on September 16 during a ceremony in New York’s Seneca Falls. (Watch Aimee’s TED Talk and Temple’s TED Talk)

The race to explore the deep ocean. In December 2015, Peter Diamandis’ XPrize Foundation announced the Shell Ocean Discovery XPrize, a $7 million global competition designed to push exploration and mapping of the ocean floor. On February 16, the foundation announced the prize’s 21 semifinalists, a group that includes everyone from middle and high school students to maker-movement enthusiasts to professionals in the field. The next hurdle for the semifinalists? The first test of their technology, where they will have just 16 hours to map at least 20% of the 500-square kilometer competition area at a depth of 2,000 meters and produce a high-resolution map. (Watch Peter’s TED Talk)

An iconic album reimagined. Released two days before his death, David Bowie’s final album, Blackstar, is the unlikely choice for a classical reimagining. MIT professor Evan Ziporyn and composer Jamshied Sharifi recast the album in full for cellist Maya Beiser and the Ambient Orchestra. The arrangement premiered March 3 at MIT’s Kresge Auditorium. (Watch Maya’s TED Talk)

Two world premieres at Tribeca. Two TED speakers have documentaries premiering at the Tribeca Film Festival in April 2017. Journalist and filmmaker Sebastian Junger’s documentary Hell on Earth, directed with Nick Quested, chronicles Syria’s descent into harrowing civil war. Surf photographer Chris Burkard’s documentary Under an Arctic Sky follows six adventurous surfers who set sail along the frozen shores of Iceland in the midst of the worst storm the country has seen in twenty-five years. (Watch Sebastian’s TED Talk and Chris’ TED Talk)

Have a news item to share? Write us at and you may see it included in this weekly round-up.

CryptogramFriday Squid Blogging: Squid Cooking Techniques

Here are some squid cooking tips.

As usual, you can also use this squid post to talk about the security stories in the news that I haven't covered.

CryptogramPodcast Interview with Me

Here's a video interview I did at RSA on the Internet of Things and security.

Krebs on SecurityDahua, Hikvision IoT Devices Under Siege

Dahua, the world’s second-largest maker of “Internet of Things” devices like security cameras and digital video recorders (DVRs), has shipped a software update that closes a gaping security hole in a broad swath of its products. The vulnerability allows anyone to bypass the login process for these devices and gain remote, direct control over vulnerable systems. Adding urgency to the situation, there is now code available online that allows anyone to exploit this bug and commandeer a large number of IoT devices.

dahuaOn March 5, a security researcher named Bashis posted to the Full Disclosure security mailing list exploit code for an embarrassingly simple flaw in the way many Dahua security cameras and DVRs handle authentication. These devices are designed to be controlled by a local Web server that is accessible via a Web browser.

That server requires the user to enter a username and password, but Bashis found he could force all affected devices to cough up their usernames and a simple hashed value of the password. Armed with this information, he could effectively “pass the hash” and the corresponding username right back to the Web server and be admitted access to the device settings page. From there, he could add users and install or modify the device’s software. From Full Disclosure:

“This is so simple as:
1. Remotely download the full user database with all credentials and permissions
2. Choose whatever admin user, copy the login names and password hashes
3. Use them as source to remotely login to the Dahua devices

“This is like a damn Hollywood hack, click on one button and you are in…”

Bashis said he was so appalled at the discovery that he labeled it an apparent “backdoor” — an undocumented means of accessing an electronic device that often only the vendor knows about. Enraged, Bashis decided to publish his exploit code without first notifying Dahua. Later, Bashis said he changed his mind after being contacted by the company and agreed to remove his code from the online posting.

Unfortunately, that ship may have already sailed. Bashis’s exploit code already has been copied in several other places online as of this publication.

Asked why he took down his exploit code, Bashis said in an interview with KrebsOnSecurity that “The hack is too simple, way too simple, and now I want Dahua’s users to get patched firmware’s before they will be victims to some botnet.”

In an advisory published March 6, Dahua said it has identified nearly a dozen of its products that are vulnerable, and that further review may reveal additional models also have this flaw. The company is urging users to download and install the newest firmware updates as soon as possible. Here are the models known to be affected so far:


It’s not clear exactly how many devices worldwide may be vulnerable. Bashis says that’s a difficult question to answer, but that he “wouldn’t be surprised if 95 percent of Dahua’s product line has the same problem,” he said. “And also possible their OEM clones.”

Dahua has not yet responded to my questions or request for comment. I’ll update this post if things change on that front.

This is the second time in a week that a major Chinese IoT firm has urgently warned its customers to update the firmware on their devices. For weeks, experts have been warning that there are signs of attackers exploiting an unknown backdoor or equally serious vulnerability in cameras and DVR devices made by IoT giant Hikvision.

Writing for video surveillance publication IPVM, Brian Karas reported on March 2 that he was hearing from multiple Hikvision security camera and DVR users who suddenly were locked out of their devices and had new “system” user accounts added without their permission.

Karas said the devices in question all were set up to be remotely accessible over the Internet, and were running with the default credentials (12345). Karas noted that there don’t appear to be any Hikvision devices sought out by the Mirai worm — the now open-source malware that is being used to enslave IoT devices in a botnet for launching crippling online attacks (in contrast, Dahua’s products are hugely represented in the list of systems being sought out by the Mirai worm.)

In addition, a programmer who has long written and distributed custom firmware for Hikvision devices claims he’s found a backdoor in “many popular Hikvision products that makes it possible to gain full admin access to the device,” wrote the user “Montecrypto” on the IoT forum IPcamtalk on Mar. 5. “Hikvision gets two weeks to come forward, acknowledge, and explain why the backdoor is there and when it is going to be removed. I sent them an email. If nothing changes, I will publish all details on March 20th, along with the firmware that disables the backdoor.”

According to IPVM’s Karas, Hikvision has not acknowledged an unpatched backdoor or any other equivalent weakness in its product. But on Mar. 2, the company issued a reminder to its integrator partners about the need to be updated to the latest firmware.

A special bulletin issued Mar. 2, 2017 by Hikvision. Image: IPVM

A special bulletin issued Mar. 2, 2017 by Hikvision. Image: IPVM

“Hikvision has determined that there is a scripted application specifically targeting Hikvision NVRs and DVRs that meet the following conditions: they have not been updated to the latest firmware; they are set to the default port, default user name, and default password,” the company’s statement reads. “Hikvision has required secure activation since May of 2015, making it impossible for our integrator partners to install equipment with default settings. However, it was possible, before that date, for integrators to install NVRs and DVRs with default settings. Hikvision strongly recommends that our dealer base review the security levels of equipment installed prior to June 2015 to ensure the use of complex passwords and upgraded firmware to best protect their customers.”


I don’t agree with Bashis’s conclusion that the Dahua flaw was intentional; It appears that the makers of these products simply did not invest much energy, time or money in building security into the software. Rather, security is clearly an afterthought that is bolted on afterwards with these devices, which is why nobody should trust them.

The truth is that the software that runs on a whole mess of these security cameras and DVRs is very poorly written, and probably full of more security holes just like the flaw Dahua users are dealing with right now. To hope or wish otherwise given what we know about the history of these cheap electronic devices seems sheer folly.

In December, KrebsOnSecurity warned that many Sony security cameras contained a backdoor that can only be erased by updating the firmware on the devices.

Some security experts maintain that these types of flaws can’t be easily exploited when the IoT device in question is behind a firewall. But that advice just doesn’t hold water for today’s IoT cameras and DVRs. For one thing, a great many security cameras and other IoT devices will punch a hole in your firewall straight away without your permission, using a technology called Universal Plug-and-Play (UPnP).

In other cases, IoT products are incorporating peer-to-peer (P2P) technology that cannot be turned off and exposes users to even greater threats.  In that same December 2016 story referenced above, I cited research from security firm Cybereason, which found at least two previously unknown security flaws in dozens of IP camera families that are white-labeled under a number of different brands (and some without brands at all).

“Cybereason’s team found that they could easily exploit these devices even if they were set up behind a firewall,” that story noted. “That’s because all of these cameras ship with a factory-default peer-to-peer (P2P) communications capability that enables remote ‘cloud’ access to the devices via the manufacturer’s Web site — provided a customer visits the site and provides the unique camera ID stamped on the bottom of the devices.”

The story continued:

“Although it may seem that attackers would need physical access to the vulnerable devices in order to derive those unique camera IDs, Cybereason’s principal security researcher Amit Serper said the company figured out a simple way to enumerate all possible camera IDs using the manufacturer’s Web site.”

My advice? Avoid the P2P models like the plague. If you have security cameras or DVR devices that are connected to the Internet, make sure they are up to date with the latest firmware. Beyond that, consider completely blocking external network access to the devices and enabling a VPN if you truly need remote access to them. has a decent tutorial on setting up your own VPN to enable remote access to your home or business network; on picking a decent router that supports VPNs; and installing custom firmware like DD-WRT on the router if available (because, as we can see, stock firmware usually is some horribly insecure and shoddy stuff).

If you’re curious about an IoT device you purchased and what it might do after you connect it to a network, the information is there if you know how and where to look. This Lifehacker post walks through some of the basic software tools and steps that even a novice can follow to learn more about what’s going on across a local network.