Planet Russell

September 07, 2008

"Planet Debian"Kyle McMartin: CONFIG_PRINTK_TIME, what is time?

GMsoft reported a few weeks ago, that kernels on his A500 were hanging on startup with CONFIG_PRINTK_TIME enabled. Knowing that all this does is prefix the kernel messages with a timestamp, I was interested to find out how this could possibly be causing a hard hang.

Obviously, the first thing to do, is to try and reproduce the problem. But I was completely unable to reproduce it on my RP3440… How very strange. Ok, well, let’s poke at the A500 and see if it will happen there.

It does! Spooky…[1]

Well, now I was really interested. Let’s see what CONFIG_PRINTK_TIME actually does… In kernel/printk.c::vprintk, around the if (printk_time) section, we see that after printing the priority level tag, such as KERN_INFO, etc., we attempt to print a timestamp obtained with cpu_clock. The returned value from cpu_clock is in nanoseconds, so before it’s printed, it is munged into two smaller integers, the whole-seconds portion, and the decimal portion.

Ok, this gives us a great place to start looking for ways PA-RISC could be tripping up on these codepaths. The first was a fairly fruitless search of the cpu_clock call chain, which turned up nothing suspicious, aside from a maze of CONFIG_ options. Turned out, on non-x86, this code reduced into some fairly trivial stuff, none of which could really have been causing a hang.

However, we now had a basis for a fairly good hunch. If it wasn’t the cpu_clock going awry, the do_div routine, or sprintf must have been causing it. A quick boot-test to comment out the do_div routine and replace it will a fixed value resulted in a working system. Hooray.

Then it hit me like a freight train, when I saw what was being printed. This was the banner line… the very first thing printed in init/main.c, right after jumping into the kernel in virtual mode at start_kernel.


Linux version 2.6.27-rc5-00283-g70bb089 (kyle@shortfin) (gcc version 4.3.1 (GCC) ) #2 SMP Sat Sep 6 19:45:05 PDT 2008
FP[0] enabled: Rev 1 Model 20
The 64-bit Kernel has started…
console [ttyB0] enabled
Initialized PDC Console for debugging.

Seeing the “FP[0] enabled” line immediately caused me to smack my head at the obviousness of the problem. We were attempting to do a division, which, because of how the kernel and libgcc are compiled, is attempting to use the fpu. However, this is faulting on the very first printk in the kernel, well before any of the architecture-specific initialization is done. A quick hack removing any printks before we initialized the fpu fixed the problem as well.

But, dirty hacks are not appropriate for mainline. I thought long and hard about a nice way to fix this, but, really, open coding firmware calls in assembly didn’t strike my fancy. There is, however, another easy way to solve it.

I ended up replacing the jump to start_kernel in head.S with my own function to turn on the fpu, and called start_kernel from there. Kind of ugly, but at least the fix is entirely contained to arch/parisc, instead of leaking all over the tree. The patch is available, but I’ve been too busy to push it this last release-cycle (and, didn’t really want to tempt fate at pushing a not-quite-actually-serious-fix outside of -rc1 time.)

This has been another post brought to you by the maintainer of an inconsequential architecture. We do hope you enjoy it.

1. This ended up being due to either 1) the fpu being enabled by firmware on the PA8800, or 2) the fact that I was doing warm resets instead of cold starts.

"ProBlogger"16 Important but Potentially Distracting Blogging Tasks

Have you ever had one of those days where you set aside time to blog and while you spend the whole time that you put aside busily doing ’stuff’ - you don’t end up actually writing anything?

I had one of those days this last week. After what felt like a busy day of ‘work’ I realized I’d not actually produced a single blog post.

As I looked back over my day and the things that I’d done it struck me that there are a lot of tasks that bloggers do that are important - but that can at times become distracting from… well… writing posts… the core task of any blogger.

16 Important but Potentially Distracting Blogging Tasks

Following are 16 potentially distracting tasks for bloggers (note, I’m not saying that any of these are not important or worthwhile, just that they can actually become a distraction if we allow ourselves to become sidetracked by them):

  1. Social Messaging - Twitter, Plurk, Friendfeed, Pownce…. (add your favorite micro blogging/social messaging service here). Each can suck up your time if you don’t get focused and put some boundaries around them.
  2. Social Bookmarking - many bloggers become somewhat obsessed with writing posts for and then gathering votes on social media sites like Digg, StumbleUpon, Yahoo Buzz, Reddit etc
  3. Social Networking - building profiles and interacting upon Facebook, LinkedIn, MySpace etc - all useful in building a brand and profile as a blogger, but potentially a distraction.
  4. Blog Design - blog design is important at creating a first impression but when you find yourself tweaking it, reworking it, planning your next one more than actually writing content for your blog you might be in trouble.
  5. SEO - like blog design there always seems to be something you could do a little better when it comes to optimizing a blog for search engines. It can be worth your time to do some of this, but one of the most effective ways of doing SEO is to write content that hits the spot with readers.
  6. Reading other Blogs in Your Niche - yet another great use of time, but many bloggers spend so much time on other people’s blogs connecting, leaving comments and even writing about them that they fail to write anything unique on their own.
  7. Reading about How to Blog - this might seem strange coming from a blogger who writes about blogging, but from time to time a blogger comes to me for advice on how to improve their blog who has done so much learning about blogging that my encouragement to them is simply to stop reading about it and start doing it.
  8. Guest Posting - I am a big fan about using guest posting on other peoples blogs to expand your profile and grow your readership - however the best way to utilize guest posting is to have great content on your own blog for the new readers you engage with to see when they come visit.
  9. Interacting with Readers - this is one that I hesitate to write about because I’m a firm believer in allocating time to spend one on one with readers - however as a blog grows it gets more and more difficult to do. There comes a time where most bloggers need to decide how to strike a balance on this front - boundaries and processes can really help.
  10. Networking with other bloggers - another great way to build brand and traffic to your own blog is to connect with other bloggers in your niche - however there are millions of blogs ‘out there’ and it can be an endless task.
  11. Monetization - finding and testing ad networks and affiliate programs can take a lot of time. Then optimizing them for your blog and tracking the results and extending your earning potential by finding private sponsorships and ad sales can really eat up even more of your time.
  12. Starting New Blogs - diversification is an important and worthwhile part of the journey of many bloggers development, however I come across some bloggers who start too many blogs too quickly and don’t give their early ones time to get going and develop before they branch out.
  13. Analyzing Stats - one of the biggest potential time suckers, that many bloggers become distracted with at different times, is analyzing your stats. Sure, you can learn a great deal from looking at who is coming to your blog, from where they come and what they do when they arrive - but at times, when you do it all day everyday, it can be a habit that takes you away from your blogging.
  14. Projects/Competitions/Memes - many bloggers wanting to run a competition or project on their blog don’t realize just how much work it can be to manage (or how hard it can be to get them working). They can bring a lot of life to a blog, but they can also be suck you (and your readers) attention away from your core blogging.
  15. Dealing with Trolls and Trouble makers - it is SO easy to get drawn into passionate (yet pointless) arguments with other bloggers and readers that can leave you emotionally drained and having wasted hours upon hours of your time. While at the time it seems to important to respond - many times it’s best just learn to hold it in.
  16. Tracking down copyright violations - unfortunately in the medium we operate there are people who scrape the content of others, whack ads on it and call it their own. While it can be important to track down these copyright violations down - the statement ‘how long is a piece of string’ comes to mind and some bloggers spend so much time tracking splogs down, issuing DMCA legal notices and attempting to get the content removed that they have little time for much else.

Let me reiterate - there’s nothing wrong with any of these activities…. BUT….

In fact I at different times I’ve recommended and given tips on all of them on this blog! However - this post is about balance and priorities.

While these are all great activities the danger is in those times when they sidetrack us from other core aspects of our blogging.

In my own blogging I try to guard against becoming distracted by:

  • Having goals (both long term but also daily goals)
  • Being aware how I’m spending time (periodically throughout each day I stop and ask myself if I’m on track
  • Setting time aside for the most important tasks (I put aside three mornings a week specifically for content creation - I block out this time and remove other distractions for these times.

What distracts you most from blogging? How do you keep yourself on track?

"Planet Debian"Wouter Verhelst: Octocores are fun

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND         
26205 wouter    20   0 21208 3940 2264 R  100  0.1   6:39.38 4 povray           
26220 wouter    20   0 21204 3988 2264 R  100  0.1   6:39.54 6 povray           
26222 wouter    20   0 21208 3960 2264 R  100  0.1   6:39.29 5 povray           
26223 wouter    20   0 21208 3948 2264 R  100  0.1   6:39.29 7 povray           
26214 wouter    20   0 21212 4008 2264 R  100  0.1   6:39.02 3 povray           
26224 wouter    20   0 21204 3924 2264 R  100  0.1   6:39.28 2 povray

Generating some test clips for a project at work...

"Planet Sysadmin"Security news roundup: Webcam voyeur gets 90 days

This week’s security events include news of a vulnerability in the 64-bit edition of OpenOffice, a privilege escalation flaw in Samba, a virus infection on the International Space Station, and the arrest of yet another webcam voyeur.

——————————————————————————————————————-

Vulnerability found in 64-bit version of OpenOffice

The current version of the OpenOffice has a flaw which will allow an attacker to perform a code injection.  Fortunately, it is specific to 64-bit version of the office suite.  While the vulnerability has already been remedied in the repositories, the tricky bit here is for users who are using 64-bit binary releases — as they normally come via Linux distributors, and not from the developers.

On its part, Red Hat has already published new 64-bit versions, though not everyone has done so yet.  You can check out the security update from Red Hat, or read about the bug as outlined at the OpenOffice site.

Privilege escalation flaw found in Samba

The Samba development team has released a new version of the open source Samba that resolves a privilege escalation vulnerability. Using the flaw, authenticated users who are logged into the system can potentially edit the group_mapping.ldb file to map any SID to root or to other users or groups. Versions of Samba from 3.2.0 to 3.2.2 are affected by this flaw.

As a temporary workaround, the file permissions to the group_mapping.ldb file can be manually set to 0600. In the meantime, two patches addressing this defect can be found on the Samba security site here.  Samba administrators are advised to upgrade to 3.2.3 or apply the patch as soon as possible.

You can read more about this vulnerability here.

Virus infection found on the International Space Station

In a somber reminder that no computer equipment is safe from the scourge of computer malware, a computer virus has been discovered on the International Space Station (ISS).  As the ISS has no direct Internet connection, the infection could only come from a newly introduced laptop, or removable media.  NASA confirmed the infection late last month.

The virus in this case is the W32.Gammima.AG worm, a fairly rare virus that gathers personal information and first seen in August 2007.  Indeed, it has spread to several laptops before it was discovered.  Apparently, it wasn’t the first time that a virus has been discovered on the ISS either.

Excerpt from heise Security:

However you might wonder why these measures are only now being taken if this, as NASA says, is not the first time it has happened. It appears that the ISS has no unified anti-virus policy in place and that several laptops on-board had no anti-virus software installed. This seems surprising since any virus in a human life critical application, such as in space, can be deadly, but even when found in non-critical systems, a virus on a space station can cost millions of dollars.

Now, what really caught my attention was this statement by NASA spokesperson Kelly Humphries.  When pressed on whether the infected laptops could be connected to the same network as mission-critical systems, Humphrias responded, “I don’t know and even if I did, I wouldn’t be able to tell you for IT security reasons”.  Wow, talk about security by obfuscation.

Yet another webcam voyeur arrested

A peeping tom who videotaped his 19-year-old stepdaughter from a secretly installed webcam located in her bedroom ceiling was sentenced to 90 days in jail.  He was apparently discovered after the girl looked through her stepfather’s laptop and discovered seven videos of herself in her bedroom. From the vantage point of the videos, she quickly found the webcam - and took both webcam and laptop to the police.

Except from Edmonton News:

Court heard the videos showed the young woman playing with her pet, grooming herself and getting dressed and included images of her in various states of undress.

This brings to mind another case earlier last month in which a 47-year-old computer technician was jailed for four years for hacking into a teenage girl’s webcam to spy on her.

Where it can be argued that technology is only an enabler, the availability and affordability of Webcams — embedded or standalone ones, have resulted in a rash of voyeur cases around the world. Of particular concern is probably the built-in webcams that can be found in literally every new laptop that is being sold on the market.  Perhaps laptop manufacturers can build covers that physically block off these built-in webcams to ally concerns of trojans opening the way to voyeurs.

Feel free to to discuss the various security events here.


"Planet Sysadmin"Set it and forget it: Tether your Windows Mobile 6 Phone to Linux

I have a love/hate relationship with my phone - an HTC PPC6800. I can't live without it - I can check my work email from anywhere, and surf the web. While I've tried many PDA's through the years, none of them have stuck, because I got tired of lugging them around. I always have my phone with me, so therefore my smartphone has made me much more organized. My wife loves it because I can remember all the upcoming appointments. Yet, I hate it. It's UI is horrible. It locks up and needs rebooted, and I feel dirty using a M$ product. Well, I found one more reason to like it. I can tether my Ubuntu laptop to my phone and get Internet access from just about anywhere. This howto is for Ubuntu, but it should work for any distro that uses bluez-utils. Note that I briefly tried to get my laptop tethered via USB, but I found several comments that it wouldn't work without a custom kernel module. Bluetooth is easier, works out of the box, and is much cooler besides ;-)

"Planet Sysadmin"Why your main program should be importable

Why your main program should be importable

When I first started coding in Python, I didn't know what I was doing. So I structured my Python programs the way I would write Bourne shell scripts or Perl programs, writing functions as necessary and useful but otherwise putting all of the logic and code in the program's file outside of functions (in what I now call 'module scope').

This is a perfectly rational structure for Python programs, and even works; my programs ran fine and were perfectly functional. But it was also a bad mistake, as I slowly discovered later; what you really want to do is put all of your code in functions (and then start one with magic).

The problem that makes it a mistake is that a program written this way cannot be imported as if it was just another Python module; if you try, the program's code immediately starts running and explosive things start happening. There are at least two reasons why this is unfortunate:

  • various useful tools like pychecker rely on importing your code in order to pick through it. This is arguably a mistake on pychecker's part and they should be using a more robust mechanism, but it's how they work right now, so if you want to use them (and pychecker is usually quite useful) you have to live with it.

    (Discovering pychecker and trying to use it on my programs was how I began to realize the mistake I'd made.)

  • being able to import your main program gives you a handy method of testing bits of it from an interactive interpreter.

    To make this really work you need to code your program so that it calls sys.exit() as little as possible. If a function runs into a fatal error it should not do the usual 'call die() with an error message' thing; instead, it should raise an exception. Only the very top of the program should catch those exceptions and wind up calling sys.exit().

    (And if you don't like phase tracking, catching and wrapping exceptions can give you a nice method to add context to the error message that you'll wind up reporting.)

I'm sure that this is strongly suggested somewhere in the Python documentation and the smart people were aware of it from the start, but I missed it (to my regret with those early programs).

Oh yes, the magic you need to make your top level function start running when your program is actually run (instead of being imported) is:

if __name__ == "__main__":
    ... run code here ...

At the module scope, __name__ is normally the name of your module (well, the name it is being imported by). When Python is running your code because it has been directly handed to the interpreter, Python sets the name to "__main__" instead.

Sidebar: my current program structure

The program structure that I have wound up adopting for my own programs looks something like this:

import sys
def process(...):
    .....

def main(args):
    .....
    try:
        process(...)
    except EnvironmentError, e:
        die("OS problem: "+str(e))
    except MyError, e:
        die(str(e))
    ....

if __name__ == "__main__":
    main(sys.argv[1:])

The main() function parses the arguments, loads configuration files, and so on, and then calls process() with whatever arguments are appropriate for the program; process() actually starts to do work. To put it one way, main() does all the stuff that only has to be done when the program is being run as an actual program.

(2 comments.)

"Debian Times"mrxvt: Fast, light multitabbed terminal emulator

Article submitted by Hugo Carrer.

As any other Debian user I love writing obscure commands on my terminal. I love too having so many open terminals that I have to come up with a special system to find the one where my favorite obscure command is running on.

To be able to enjoy this I need a very fast multitabbed terminal emulator: mrxvt.

Some of the things I like the most about mrxvt are for example,

  • It is very fast and light.
  • Fast pseudo-transparency.
  • Background with your favorite images.
  • Highly configurable keyboard shortcuts.
  • You can have the same command typed on every tab at the same time. This feature is disabled by default. you can enable it by editing /etc/mrxvt/mrxvtrc and uncommenting the ToggleBroadcast macro (around line 171). After that, Ctrl+Shift+d toggles input broadcasting to all tabs.
  • Automatic or “by-hand” tab labeling.
  • It is independent of your desktop (no KDE or GNOME needed).
  • Did I mention that is very fast and light?

After installing it would look something like this:

a just installed mrxvt

You can change this rather old fashioned look by copying the example config file from
/usr/share/doc/mrxvt-common/examples/mrxvtrc.sample.gz
And placing it in ~/.mrxvtrc

The file is full of comments helping you with the meaning of each option. Of course you can find all available options in the man page. Some useful shortcuts are Ctrl-shift-t to open a new tab and Ctrl-shift-m to show the menu.

So, after playing, trying and tweaking for a little while you can get a futuristic look for your terminals. Like this one of me sketching this article on an emacs session inside mrxvt (Note all those beautiful tabs up there)

mrxvt in action

Downsides? Well it depends on the kind of user,

  • No UTF-8 support.
  • It has no config menu.
  • You have to remember the shortcuts or read the config file every now and then.
  • And as with anything worth doing, to get things working the way you want to you’ll have to read through the man page and maybe scratch your head once or twice but it’ll work.

To sum up, it’s the perfect application to config during those boring rainy weekends and then show off to your friends at work.

mrxvt is available in Debian stable and in Ubuntu too.

"The Reid Report"The top ten reasons Oprah should tell everybody to kiss her entire ass

White women in a huff over Oprah's booking decisions.

The cat fur is flying over the non-story about Oprah supposedly banning Caribou Barbie from her show (recalling a previous cat-fight over Oprah not backing Hillary Clinton during the primaries.) This enterprising blog even offered five reasons O should let Sarah Palin on. Some of the comments the post elicited ranged from the profane, to the downright ridiculous:

When will the media openly and honestly report the facts about black crime, welfare rates, and IQ?

Visit C H l M P O U T . C O M

if you dare to open your mind and decide for yourself.

My name says it all.

We suspected it-Now We know it is TRUE
Oprah is a Racist & Hates White American Women

She sure does like their money though.
First time she supports a candidate and he is black?=Racist

First Republican VP candidate and it is a White woman and Oprach won't have a woman on her show about women? = Racist

---

Oprah should be ashamed of herself. Obama, the 50% white/50% black candidate, isn't the only one making history. How about the 1st female VP candidate on a Republican ticket? Whatever happened to "girl power", Oprah? Aren't women the ones who've made you a billionaire? Nice way to say thanks.

Pathetic.

---

Oprah is going to deny a woman who attracted 40 million viewers and is the first female vice presidential nominee in 24 years from appearing on her show? Unbelievable. I think Oprah is going to have mutiny on her hands. She is selling out all women again to support an African American man. She doesn't have to endorse Palin or even discuss politics, but to pretend as if her audience isn't interested in her story is absurd. I think Oprah is going to lose a lot of fans over this, it shows she is clearly biased and identifies and sympathizes more with the struggles of black people than she does with women. After observing this election process I think it's proven that women have a much greater struggle towards equality. What a shame that Oprah will sit on her hands once again when women need her support.

#1. Those who say that Oprah owed her support to Hillary Clinton and now owes it to Sarah Palin because they are women are rank hypocrites for criticizing her for supporting a black candidate.

#2. Those who claim Oprah owes white women for her success are blatant racists, whose sense of entitlement even extends to Oprah's well-EARNED success. I suppose she should show her appreciation by giving up her seat on the bus when a white woman wants to sit down?

#3. Clearly Oprah owes nothing to people who are so quick to turn on her, including drawing for the race card, when she doesn't tow the line by putting (white) women first.

#4. Anyone who thinks Sarah Palin MUST be allowed to go on Oprah's show, but don't mind that she refuses to go on actual news shows like "Meet the Press", or face reporters at all, is an authoritarian fool so nurtured on Fox News propaganda that you feel Palin should be worshipped into office instead of voted in on the basis of facts. By your logic, the geezer must immediately be booked on "Montel."

#5. If you don't like the way Oprah runs her show, don't watch it, or better yet, create a (right wing) show of your own. Oh, that's right, you have no talent, and instead have been bamboozled into watching Oprah's show all this time, thinking you were purchasing her eternal loyalty to white women along with her favorite things...

#6. Each of you is entitled to your political views, and so is Oprah. You don't see her out there telling you how stupid you are for voting for four more years of Bush policies just because they're dressed up in the guise of a geriatric old man and his pretty Alaskan nurse, do you?

#7. I thought dittoheads didn't believe in the fairness doctrine...

#8. If you're so hopped up on entertainers giving equal time to pols, maybe Kelsey Grammar and Bruce Willis could be forced to do a movie with Barack? It could be called "Die Hard with a guy who sounds British but really isn't and a guy Sean Hannity says is Muslim, but also isn't..."

#9. OPRAH IS NOT A REPORTER, AND HER SHOW IS NOT THE PLACE FOR ELECTION COVERAGE!

#10. Republicans have already established that they hate celebrities. Going on Oprah would further establish Sarah Palin as a celebrity, thereby making you hate her. And you can't hate her, wingers ... because she is your queen.

---
Banned books + lots of earmarks + abuse of office to fire brother-in-law + Alaska secessionist party + mayor of 9,000 = vice presidential material! Only in America...

Previous:
|

"365 Tomorrows"Little Blue Pills

Author : Lokon

Richard was forty, paunchy and balding when he came home early and found Susan on the bed they shared. The thing on her and in her was a vibrating mass of warm rubberized orgasm; moving in and out of-across her, her eyes and ears were hidden behind the goggles flashing the holos of what Richard assumed to be one of her Romance novels. She neither saw him nor heard him, and Richard had a manic moment where he imagined she wouldn’t have cared either way. The discarded box it had arrived in professed it as ‘the best sex on the market’ Richard fingered the wedding band she had placed on his finger. His flesh bulged around the too tight metal. He left quietly.

Richard started taking pills. The blue pill made him hard on demand led to the brown pill to keep him going to the red pill to make him more aware of her and better. The pills brought want of the augments. They put little circuits in his head to help him remember dates and recite Shakespeare and Donne on command. At first they were to please her, and then they were just for him. The augments led to uploading, back ups, and gene therapy.

Susan aged and Richard grew to be more then he had been, muscles beginning to regrow and hair migrating from his back to the top of his head. “Darling” Susan said on her 90th birthday “Die with me. We were not meant for more then we were given. Promise me that you will be human with me in the end.” Richard was 96 and looked 28, but said “Yes” as he promised to join the dying who were not to be wooed by the seductive murmurings of technologic immortality.

Richard was getting used to his new legs and eyes when he found Susan there. Susan was locked in a box in her best Sunday clothes, earth forming all around her wooden walls with a tombstone like a sundae’s cherry on top. Next to it was Richard’s marker, now only signifying the shell he’d discarded just before Susan had closed her eyes for good. “I am sorry dearest, I didn’t want to if I didn’t have to.”

Discuss the Future: The 365 Tomorrows Forums
The 365 Tomorrows Free Podcast: Voices of Tomorrow
This is your future: Submit your stories to 365 Tomorrows

"Planet Sysadmin"Scobleizer — Tech geek blogger » Blog Archive The Superbowl of Startups «

Scobleizer — Tech geek blogger » Blog Archive The Superbowl of Startups «:



"Blogging is NOT reporting. It’s the single voice of a person. When you read me here you are reading me the way I’d talk to you at a cocktail party. You’re hearing my opinions. If I’m doing ‘reporting’ then you’ll know, because of how I source it."


I really like this explanation for blogging. Maybe we need "Blogging is NOT reporting" T-shirts.

"Revealing Errors"Google Miscalculator

This post on a search engine blog pointed out a series of very strange and incorrect search results returned by Google's search engine. A very complicated "black box," many of the errors described highlight and reveal some aspect of Google's search technology.

My favorite was this error from Google Calculator:

Error showing 1.16 as a result for eight days a week

The error, which has been fixed, occurred when users searched for the the phrase "eight days a week" -- the name of a Beatles's song, film, and sitcom.

Google Calculator is a feature of Google's search engine that looks at search strings and, if it thinks you are trying to ask a math question or a units conversion, will give you the answer. You can, for example, search for 5000 times 23 or 10 furlongs per fortnight in kph or 30 miles per gallon in inverse square millimeters -- Google Calculator will give you the right answers. While it would be obvious to any human that "eight days a week" was a figure of a speech, Google thought it was a math problem! It happily converted 1 week to 7 days and then divided 8 by 7: roughly 1.14.

Clearly, the error reveals the absence of human judgment -- but we knew that about Google's search engine already. More intriguing is what this, combined with a series of other Google Calculator errors, might reveal about the Google's black box software.

When Google launched its Calculator feature, it reminded me of GNU Units -- a piece of free/open source software written by volunteers and distributed with an expectation that those who modify it will share with the community. After playing with Google Calculator for a little while, I tried a few "bugs" that had always bothered me in Units. In particular, I tried converting between Fahrenheit and Celsius. Units converts between the amount of degrees (for example, a change in temperature). It does not take into account the fact that the units have a different zero point so it often gives people an unexpected (and apparently incorrect) answer. Sure enough, Google Calculator had the same bug.

Now it's possible that Google implemented their system similarly and ran into similar bugs. But it's also quite likely that Google just took GNU Units and, without telling anyone, plugged it into their system. Google might look bad for using Units without credit and without assisting the community but how would anyone ever find out? Google's Calculator software ran on the Google's private servers!

If Google had released a perfect calculator, nobody would have had any reason to suspect that Google might have borrowed from Units. One expects unit conversion by different pieces of software to be similar -- even identical -- when its working. Identical bugs and idiosyncratic behaviors, however, are much less likely and much more suspicious.

Given the phrase "eight days a week", Units says "1.1428571."

"Planet Debian"Benjamin Mako Hill: Happy Birthday GNU

Nearly a week after its release, I suspect most of my audience has seen the FSF's Freedom Fry video of Stephen Fry wishing the GNU project and the free software movement a happy birthday. While I'm not usually one for birthdays, I thought I'd at least reflect on it briefly. Certainly, it's a wonderful video -- for which Matt Lee at others at the FSF should be proud. But it's fact that the GNU project is now twenty-five years old that is truly noteworthy.

/copyrighteous/images/freedom_fry.png

Wikipedia says that a generation (i.e., the average interval between the birth of parents and of their offspring) is somewhere between 25-30 years in most of the Western world. Twenty-five years isn't just a big number divisible by five, it marks a generational shift.

Certainly, GNU has matured and accomplished wonderful things in last quarter-century. More importantly perhaps, it's produced wonderful progeny. It has spawned hundreds of thousands of free software projects, thousands of free or nearly-free operating systems, and an unbelievably vibrant global free and open source software community. Beyond the software realm, the free culture movement, most free licensing projects, and much of the access to knowledge movement can trace a connection back to GNU. We are living, and building, a new generation of the free software movement.

It's not been an entirely smooth ride, feelings have been hurt, and it's hard for GNU's proponents -- myself included -- to not wince at some of what has been done in GNU's name and because of its example. But even cynics must admit: the world is an undeniably better place because of GNU and the efforts and ideas that it has motivated.

I turn 28 in December and have spent my entire computing life in world where free software was a viable option and an active form of resistance. Here's to another generation! May we be half as productive and positive as the last!

September 06, 2008

"Planet Sysadmin"Slightly Advanced Python: Some Python Internals

"Planet Sysadmin"The Python Object Model

"The Reid Report"Sarah Palin: not ready to take questions on day one

So, we're supposed to take Sarah Palin seriously as a potential vice president of the United States, AND accept that she's not ready to take questions from reporters ... because she might make a mistake???

|

"Planet Debian"Ben Hutchings: chmod -x considered harmful

I discovered an interesting "feature" of chmod(1), which caused a package build to fail. According to the GNU manual page, if no letters are used before a "-" or "+", "the effect is as if a were given, but bits that are set in the umask are not affected." The command will also fail with an error message when it does this!

The Single Unix Specification says this is correct, though there is some ambiguity over whether the exit status should be 0 or not.

Anyway, the result is that it is generally a bad idea to call chmod in this way in a script. Always specify who the permission changes should apply to.

"ProBlogger"What do you get if you put Shoemoney, CopyBlogger, Chow, Johnson, Kukral and ProBlogger in the same Room?

Make-Money-Online-BloggingOK - that heading sounds like a bad joke (and it could be) but I had confirmation this week that Jeremy from ShoeMoney is joining our ‘make money online with a blog‘ panel at Blog World Expo.

This makes the lineup - Brian Clark from CopyBlogger, John Chow, Zac Johnson, Jim Kukral, ShoeMoney and myself. It’s going to be some panel. I’m very excited because while I’ve met Jeremy and Brian before the other four will be first time meetings for me.

The only challenge is going to be that the session is only 60 minutes and with 6 of us it’s going to be hard to fit it all in! Hopefully we can find some time together outside the session to create some more good connecting and learning!

Register with the code PBVIP (or any of the others that are going around at the moment) and you get 20% off. If you’re coming - don’t forget to get there a day early so you can come to b5’s free day of training on the 18th.

"Planet Sysadmin"TXS vs DJB - Round One!

vs

That's DJB on the left and TXS on the right. I've never seen either of them in the same room at the same time, so there is still a chance they are the same entity. Maybe TXS is out teachng classes, writing software, or doing other academic things when he should be sleeping or maybe DJB is out having TXS' life while he should be sleeping. This could all be one big real life Fight Club scenario (without the fighting). The world may never solve the mystery that is TXS vs DJB.

If you're in the insanely small population that has not seen Fight Club or read the book, then I'm sorry I ruined it for you. Maybe you should get with the modern era if you're going to continue to read my drivel.

I dug that picture of DJB out from archive.org.

"MegaTokyo"Comic [1154] "debugging"

"Planet Sysadmin"Internal Security Staff Matters

I read Gunter Ollmann's post in the IBM ISS blog with interest today. Gunter is "Director Security Strategy, IBM Internet Security Systems," so he is undoubtedly pro-outsourcing. Here is his argument:

[S]ecurity doesn’t come cheap. While individual security technologies get cheaper as they commoditize, the constant influx of new threats drives the need for new classes of protection and new locations to deploy them...

If you were to examine a typical organizations IT security budget, you’d probably see that the majority of spend isn’t in new appliances or software license renewals, instead it’ll lie in the departments staffing costs...

This is at odds with the way most organizations normally deal with specialized and professional skill requirements... Just about every organization I deal with (including some of the biggest international companies) relies upon external agencies to provide these specialist services and consultancy – as and when required – it’s more cost effective that way.

With that in mind, why are organizations building up their own highly-trained (and expensive) specialist internal security teams? Granted, some of the security technologies being deployed by organizations are relatively complex, but do they really require a Masters degree and CISSP certified experts to babysit them full-time...

Nowadays you can tap in an incredibly broad range of expertise – ranging from hard-core security researchers capable of helping you evaluate the security of new products you’re thinking of buying and deploying throughout your enterprise, through to 24x7 security sentinels; so knowledgeable about the security product you’ve deployed that they’re capable of guaranteeing protection with money-back SLA’s...

Organizations should take a closer look at their security budgets and evaluate whether they’re getting the right value out of their internal teams and whether their skills investment meets the daily need of the business.
(emphasis added)

By highlighting the focus on "security products," you can probably predict my response to Gunter's post. Sure, you can get hire experts that may (or may not) be cheaper than internal staff, and they may be smarter in individual products or even defensive tactics, but they are poor with respect to the most critical aspect of modern security: business knowledge. It does not matter if you are the world's greatest packet monkey if you 1) don't know what matters to a business; 2) don't know business systems; 3) don't know what is normal for a business... do I need to continue?

This is the biggest challenge I see for consultants, having been one and having hired them. It's easier to hire a consultant to help configure a security product than it is to figure out if that product is even needed, which to buy, how to get approval and business buy-in, how to support it operationally, and a dozen other decisions.

I agree that certain specialized tasks merit outside support. That list changes from organization to organization. However, beware arguments like Gunter's.

"ProBlogger"How to Live Blog an Event

The subject of live blogging has come up for me three times in the last 24 hours so I thought it might make a good reader discussions.

How would you go about live blogging at a conference?”

That was the question I was just asked - how would you do it? What tools would you use? What strategies would you use to get content online?

PS: as I was about to hit publish on this question an article on this very topic appeared in my RSS feed on Web Worker Daily - Preparing to Live Blog an Event. It’s got some good tips - but what would you add?

"Planet Sysadmin"The Analyzer Charged Again

I read a name I hadn't seen in years today when I read Kim Zetter's story Israeli Hacker Known as "The Analyzer" Suspected of Hacking Again:

Canadian authorities have announced the arrest of a 29-year-old Israeli named Ehud Tenenbaum whom they believe is the notorious hacker known as "The Analyzer" who, as a teenager in 1998, hacked into unclassified computer systems belonging to NASA, the Pentagon, the Israeli parliament and others.

Tenenbaum and three Canadians were arrested for allegedly hacking the computer system of a Calgary-based financial services company and inflating the value on several pre-paid debit card accounts before withdrawing about CDN $1.8 million (about U.S. $1.7 million) from ATMs in Canada and other countries. The arrests followed a months-long investigation by Canadian police and the U.S. Secret Service.


The Analyzer was the "mastermind" behind Solar Sunrise, one of the original "so easy a Caveman could do it" intrusions -- back in 1998. Solar Sunrise was huge and it was one of several very rude awakenings I remember while serving in the Air Force that decade.

Seeing The Analyzer back in law enforcement custody reminds me of the post I made about Max Ray Butler and somewhat of my post Intruders Selling Security Software. It's all about trust.

"Planet Sysadmin"Bejtlich Keynote at 1st ACM Workshop on Network Data Anonymization

Brian Trammell and Bill Yurcik were kind enough to ask me to deliver the keynote at the 1st ACM Workshop on Network Data Anonymization (NDA 2008). The one day event takes place 31 October 2008 at George Mason University in northern VA. My talk will discuss the trials and tribulations of OpenPacket.org, and changes planned for the project.

"Planet Sysadmin"Solaris Core Analysis, Part 2: Solaris CAT

In Part 1 we discussed core analysis in general and some basic mdb commands for high level investigation. When you dig deeper things can get confusing and complex because everything is referenced by address. This is where the Solaris Crash Analysis Tool comes in.

Solaris CAT has been around for a long time, but only as of version 5.0 released on June 18th of this year has it been available for Solaris X86/X64. You can find the Solaris CAT 5.0 Release Notes here.

To get started, download CAT 5.0, uncompress and install the package:

# bunzip2 SUNWscat5.0-GA-i386.pkg.bz2
# pkgadd -G -d ./SUNWscat5.0-GA-i386.pkg 

The following packages are available:
  1  SUNWscat     Solaris Crash Analysis Tool (5.0 GA SV4622M)
                  (i386) 5.0

Select package(s) you wish to process (or 'all' to process
all packages). (default: all) [?,??,q]: 1

Processing package instance  from 

Solaris Crash Analysis Tool (5.0 GA SV4622M)(i386) 5.0
...

The package will, by default, install into /opt/SUNWscat. There are two binaries we're really interested in, found in the bin/ directory: scat and blast. The scat tool is the CLI interface to Solaris CAT and provides a shell which is a human friendly re-implementation of mdb (no "::" prefixing commands, etc.) The blast tool is a really nice Java GUI interface to the CLI which adds a lot of "just click here" functionality and is excellent for testing and playing around. I highly recommend you point your browser at /opt/SUNWscat/docs/index.html, which includes some minimal but extremely useful HTML documentation.

Authors note: I'm resisting a "scat" joke with amazing strength. Seriously... resisting.... so.... hard....

We'll focus on the CLI here. Invocation is a little unusual; add /opt/SUNWscat/bin to your path and then change to the directory containing your dumps (usual /var/crash/hostname/), for the .0 dumps use "scat 0", for the .1 dumps use "scat 1", and so on. You'll fine the "online help" within the CLI exceptional, lets look:

# export PATH=$PATH:/opt/SUNWscat/bin
# cd /var/crash/ev2-r01-s10/
# ls -l
total 14205330
-rw-r--r--   1 root     root           2 Aug 25 07:49 bounds
-rw-r--r--   1 root     root     1444762 Aug 25 07:43 unix.0
-rw-r--r--   1 root     root     7268106240 Aug 25 07:49 vmcore.0
# scat 0

  Solaris[TM] CAT 5.0 for Solaris 11 64-bit x86
    SV4622M, Jul  3 2008

  Copyright © 2008 Sun Microsystems, Inc. All rights reserved.
  Use is subject to license terms.

  Feedback regarding the tool should be sent to SolarisCAT_Feedback@Sun.COM
  Visit the Solaris CAT blog at http://blogs.sun.com/SolarisCAT

opening unix.0 vmcore.0 ...dumphdr...symtab...core...done
loading core data: modules...symbols...ctftype: unknown type struct panic_trap_info
CTF...done

core file:      /var/crash/xxxxxxxx/vmcore.0
user:           Super-User (root:0)
release:        5.11 (64-bit)
version:        snv_67
machine:        i86pc
node name:      xxxxxxxxxxxxxxxxxx
system type:    i86pc
hostid:         xxxxxxxx
dump_conflags:  0x10000 (DUMP_KERNEL) on /dev/dsk/c0t0d0s1(24.0G)
time of crash:  Mon Aug 25 07:41:00 GMT 2008 (core is 13 days old)
age of system:  91 days 22 hours 49 minutes 50.97 seconds
panic CPU:      1 (8 CPUs, 31.9G memory)
panic string:   page_free pp=ffffff0007243bd8, pfn=11228e, lckcnt=0, cowcnt=0 slckcnt = 0

sanity checks: settings...vmem...
WARNING: FSS thread 0xffffff097d1e3400 on CPU2 using 99%CPU
WARNING: FSS thread 0xffffff09fddbab40 on CPU3 using 99%CPU
sysent...clock...misc...
NOTE: system has 54 non-global zones
done
SolarisCAT(vmcore.0/11X)> 

When CAT is unleashed on a dump several "sanity checks" are run which can point out glaring known issues. There is an HTML document in the docs/ directory which outlines all the various sanity checks. These checks alone make CAT a must-have tool! Sanity check output will come in two varieties, "WARNING" which indicates something out of whack that may have been the cause or contributor to the crash, and "NOTE" which is unlikely the cause but of interest. We can see in the example above two warnings telling me that 2 threads were consuming 99% of a CPU... thats handy! It also notes that I'm running 54 zones.

The available commands a broken down into categories which you can see using the "help" command. The first group are for "Initial Investigation:" and include: analyze, coreinfo, msgbuf, panic, stack, stat, and toolinfo. Lets look at the "analyze" commands output:

SolarisCAT(vmcore.0/11X)> analyze

core file:      /var/crash/xxxxxx/vmcore.0
user:           Super-User (root:0)
release:        5.11 (64-bit)
version:        snv_67
machine:        i86pc
node name:      xxxxxx
system type:    i86pc
hostid:         xxxxx
dump_conflags:  0x10000 (DUMP_KERNEL) on /dev/dsk/c0t0d0s1(24.0G)
time of crash:  Mon Aug 25 07:41:00 GMT 2008 (core is 13 days old)
age of system:  91 days 22 hours 49 minutes 50.97 seconds
panic CPU:      1 (8 CPUs, 31.9G memory)
panic string:   page_free pp=ffffff0007243bd8, pfn=11228e, lckcnt=0, cowcnt=0 slckcnt = 0


==== panic thread: 0xfffffffef4ce5dc0 ==== CPU: 1 ====
==== panic user (LWP_SYS) thread: 0xfffffffef4ce5dc0  PID: 10156  on CPU: 1 ====
cmd: /opt/local/sbin/httpd -k start
t_procp: 0xffffffff06595e50
  p_as: 0xffffffff093490e0  size: 47374336  RSS: 3125248
  hat: 0xffffffff092a9480  cpuset: 1
  zone: address translation failed for zone_name addr: 8 bytes @ 0x3

t_stk: 0xffffff00486bcf10  sp: 0xffffff00486bc880  t_stkbase: 0xffffff00486b8000
t_pri: 3(FSS)  pctcpu: 0.380035
t_lwp: 0xfffffffefe61ab60  lwp_regs: 0xffffff00486bcf10
  mstate: LMS_SYSTEM  ms_prev: LMS_SYSTEM
  ms_state_start: 2 minutes 31.229022230 seconds earlier
  ms_start: 2 minutes 31.343582414 seconds earlier
psrset: 0  last CPU: 1  
idle: 0 ticks (0 seconds)
start: Mon Aug 25 07:41:00 2008
age: 0 seconds (0 seconds)
syscall: #131 memcntl(, 0x0) ()
tstate: TS_ONPROC - thread is being run on a processor
tflg:   T_PANIC - thread initiated a system panic
        T_DFLTSTK - stack is default size
tpflg:  TP_MSACCT - collect micro-state accounting information
tsched: TS_LOAD - thread is in memory
        TS_DONT_SWAP - thread/LWP should not be swapped
        TS_RUNQMATCH
pflag:  SMSACCT - process is keeping micro-state accounting
        SMSFORK - child inherits micro-state accounting

pc:      unix:vpanic_common+0x13b:  addq   $0xf0,%rsp

unix:vpanic_common+0x13b()
unix:panic+0x9c()
unix:page_free+0x22e()
unix:page_destroy+0x100()
genunix:fs_dispose+0x2e()
genunix:fop_dispose+0xdc()
genunix:pvn_getdirty+0x1f0()
zfs:zfs_putpage+0x129()
genunix:fop_putpage+0x65()
genunix:segvn_sync+0x39f()
genunix:as_ctl+0x1f2()
genunix:memcntl+0x709()
unix:_syscall32_save+0xbf()
-- switch to user thread's user stack --

This output provides a vast array of useful details, including:

  • System summary, including OS release and version, architecture, hostname, and hostid; as well as number of CPU's and memory
  • Time of crash and previous uptime ("age of system")
  • The panic string and CPU that it occurred on
  • The thread that caused the panic and its details, including the command (argc &argv), its memory footprint (size & rss), and zone
  • The threads state information, run time, start time, current syscall
  • The call stack

As noted in Part 1, what most people are really looking for when doing core analysis is to determine which application was responsable, and this output provides that data in great clarity. Lets dig into it a bit more explicitly... based on the above "analyze" output we can see that....

  • The system is an 8CPU X86 box running snv_67 (Solaris Nevada Build 67) in 64bit mode with 32GB of RAM.
  • System crashed on Aug 25th at 7:41AM GMT, it was previously up for 91 days
  • System paniced on "page_free" call, on CPU 1
  • The running thread was "httpd -k start"... an Apache worker process.
  • The process had the PID 10156, consumed 3.1MB of Physical Memory (RSS) and had a virtual size of 47MB
  • The process was using less than 1% (pctcpu) of CPU 1, was using the Fair Share Scheduler (FSS), on Processor Set (psrset) 0.
  • The process started on Aug 25th at 7:41AM GMT, it was 0 seconds old when it crashed... possibly a forked worker gone bad.

For many administrators this might be as much as you wanted to know, right there. But lets look at a couple more commands.

You'll recall that during the sanity checks at startup it noted 2 threads consuming full CPU's. We can feed the thread address to the "thread" command to get details on them:

SolarisCAT(vmcore.0/11X)> thread 0xffffff097d1e3400
==== user (LWP_SYS) thread: 0xffffff097d1e3400  PID: 27446  on CPU: 2 ====
cmd: nano svn-commit.tmp
t_procp: 0xffffffff2e908ab0
  p_as: 0xffffffff10402ee0  size: 2772992  RSS: 1642496
  hat: 0xffffffff102f6b48  cpuset: 2
  zone: address translation failed for zone_name addr: 8 bytes @ 0x2

t_stk: 0xffffff004e47ef10  sp: 0xffffff003d3fcf08  t_stkbase: 0xffffff004e47a000
t_pri: 26(FSS)  pctcpu: 99.306175
t_lwp: 0xffffffff202a78b0  lwp_regs: 0xffffff004e47ef10
  mstate: LMS_SYSTEM  ms_prev: LMS_USER
  ms_state_start: 2 minutes 31.228983791 seconds earlier
  ms_start: 39 days 19 hours 11 minutes 8.989252296 seconds earlier
psrset: 0  last CPU: 2  
idle: 9 ticks (0.09 seconds)
start: Wed Jul 16 12:30:07 2008
age: 3438653 seconds (39 days 19 hours 10 minutes 53 seconds)
syscall: #98 sigaction(, 0x0) ()
tstate: TS_ONPROC - thread is being run on a processor
tflg:   T_DFLTSTK - stack is default size
tpflg:  TP_TWAIT - wait to be freed by lwp_wait
        TP_MSACCT - collect micro-state accounting information
tsched: TS_LOAD - thread is in memory
        TS_DONT_SWAP - thread/LWP should not be swapped
        TS_RUNQMATCH
pflag:  SMSACCT - process is keeping micro-state accounting
        SMSFORK - child inherits micro-state accounting

pc:      unix:panic_idle+0x23:  jmp    -0x2     (unix:panic_idle+0x23)

unix:panic_idle+0x23()
0xffffff003d3fcf60()
-- error reading next frame @ 0x0 --

So using the "thread" command we can get full granularity on a given thread. In fact, using the "tlist" command you can dump this information for every thread on the system at the time of crash.

Another nifty command is "tunables". This will display the "current value" (at time of the dump) and the default value. If someone's been experimenting on the production systems this will clue you in.

SolarisCAT(vmcore.0/11X)> tunables   
    Tunable Name     Current   Default Value  Units      Description
                     Value                               
    physmem          8386375   *              pages      Physical memory 
                                                         installed in system.
    freemem          376628    *              pages      Available memory.
    avefree          338943    *              pages      Average free memory 
                                                         in the last 30 seconds
.........

Using the "dispq" command we can look at the dispatch queues (run queue). This answers "what other processes were running on CPU at the time of the crash", again, using the thread address we can dig into them with "thread":

SolarisCAT(vmcore.0/11X)> dispq
      CPU                  thread               pri        PID cmd
  0 @ 0xfffffffffbc26bb0   0xffffff003d005c80    -1            (idle)
               pri  60 -=> 0xffffff004337dc80    60          0 sched
  1 @ 0xfffffffec6634000 P 0xfffffffef4ce5dc0 P   3      10156 /opt/local/sbin/httpd -k start
  2 @ 0xfffffffec662f000   0xffffff097d1e3400    26      27446 nano svn-commit.tmp
  3 @ 0xfffffffec66f4800   0xffffff09fddbab40    25      21329 java -jar xxxxx.jar --ui=console
  4 @ 0xfffffffec66ea800   0xffffff003d414c80    -1            (idle)
               pri  60 -=> 0xffffff0048b12c80    60          0 sched
  5 @ 0xfffffffec6770800   0xffffff003d4b0c80    -1            (idle)
  6 @ 0xfffffffec6770000   0xffffff003d53bc80    -1            (idle)
  7 @ 0xfffffffec6762000   0xffffff003d58fc80    -1            (idle)

      part                 thread               pri        PID cmd
  0 @ 0xfffffffffbc4eef0

There are far too many to go through in a blog entry... but lets look at my personal favorite, "zfs". The "zfs" command can show us the pool(s), their configuration, read/write/checksum/error stats, and even ARC stats!

SolarisCAT(vmcore.0/11X)> zfs -e
ZFS spa @ 0xfffffffec6c21540
    Pool name: zones
    State: ACTIVE
       VDEV Address      State    Aux   Description
    0xfffffffec0a9e040  FAULTED    -       root

            READ   WRITE   FREE   CLAIM   IOCTL  
    OPS        0      0     0      0      0 
    BYTES      0      0     0      0      0 

    EREAD       0
    EWRITE      0
    ECKSUM      0

            VDEV Address      State    Aux     Description
         0xfffffffec0a9eac0  FAULTED    -    /dev/dsk/c0t1d0s0

                  READ      WRITE     FREE   CLAIM   IOCTL  
         OPS     74356305  578263155     0      0      0 
         BYTES       757G      10.4T     0      0      0 

         EREAD       0
         EWRITE      0
         ECKSUM      0
SolarisCAT(vmcore.0/11X)> zfs arc

ARC (Adaptive Replacement Cache) Stats:

    hits                       77708247444
    misses                         1930348
    demand_data_hits           74303514929
    demand_data_misses             1325511
    demand_metadata_hits         620388795
    demand_metadata_misses          160708
    prefetch_data_hits          1361651307
....

I hope this helps you get an idea of how easy it is to really dig deeply into your core dumps using Solaris CAT to hide the oddities of mdb from you. Its a powerful and robust tool, and I'm glad that we have it.

Happy dump divin'! You'll be amazed how much you'll learn about your system.

"Planet Sysadmin"Request for Feedback on Deny by Default

A friend of mine is working on digital defense strategies at work. He is interested in your commentary and any relevant experiences you can share. He is moving from a "deny bad, allow everything else" policy to an "allow good, deny everything else" policy.

By policy I mean a general approach to most if not all defensive strategies. On the network, define which machines should communicate, and deny everything else. On the host, define what applications should run, and deny everything else. In the browser, define what sites can be visited, and deny everything else. That's the central concept, although expansions are welcome.

My friend would like to know if anyone in industry is already following this strategy, and to what degree. If you can name your organization all the better (even if privately to me, or to him once the appropriate introductions are made). Thank you.

"Planet Sysadmin"Solaris Core Analysis, Part 1: mdb

Solaris is one of the most stable operating systems available... but lets face it, stuff happens. Solaris does panic, but I want everyone to be clear, a "panic", despite the seemingly contradictory name, is by its nature a controlled event. When the kernel encounters behavior that is uncorrectable and will cause irreparable harm to the running system or, even worse, corrupt data, the system will voluntarily tap out using the panic system call to get the system down quickly, hopefully leaving a core dump in its wake for post-mortem analysis.

In this blog entry we'll discuss core dumps and panic's in general. In part 2 we'll discuss a tool to make life just a little easier, the Solaris Crash Analysis Tool, or "Solaris CAT".

I want to point out that post-mortem core analysis is really the task of a kernel engineer. The fact is, way less than 1% of us who ever engage in core analysis are actually going to have any real idea of what the hell we're doing. And thats ok! I guarantee that you'll post something from an analysis to a mailing list and you'll get some asshole who forgets that he's been paid to work on the Solaris kernel for the last 20 years while you work a job which is now on hold because of said core dump, with replies like "We can clearly see that due to the memory address in this register that you are a moron...." The point here is, if you don't know what your doing, don't be discouraged. What we, mere mortals, are trying to do is not necessarily solve the problem but provide clues which will help us guide our search, either by posting a stack trace to a mailing list, or send the dump to Sun Support, or to take a panic string and search the bug database or Google for. The cuddletech rule of crashes is:

An unexpected crash is unacceptable; An unexplained crash is inexcusable.

If you're reading this you've probably lived through a panic before, but lets recap. The best explanation of a "crash" event and resulting dump can be found in the dumpadm(1M) man page:

     A crash dump is a disk copy of the physical memory
     of  the computer at the time of a fatal system error. When a
     fatal operating system error occurs,  a  message  describing
     the  error  is  printed to the console. The operating system
     then generates a crash dump by writing the contents of  phy-
     sical  memory to a predetermined dump device, which is typi-
     cally a local disk partition. The dump device can be config-
     ured  by way of dumpadm.  Once the crash dump has been writ-
     ten to the dump device, the system will reboot.

     Fatal operating system errors can be caused by bugs  in  the
     operating system, its associated device drivers and loadable
     modules, or by faulty  hardware.  Whatever  the  cause,  the
     crash  dump  itself  provides invaluable information to your
     support engineer to aid in diagnosing the problem. As  such,
     it  is  vital  that the crash dump be retrieved and given to
     your support provider. Following an operating system  crash,
     the  savecore(1M)  utility  is executed automatically during
     boot to retrieve the crash dump from the  dump  device,  and
     write it to a pair of files in your file system named unix.X
     and vmcore.X, where X is an integer  identifying  the  dump.
     Together,  these  data  files form the saved crash dump. The
     directory in which the crash dump is  saved  on  reboot  can
     also be configured using dumpadm.

I encourage you to read both the savecore(1M) and dumpadm(1M) man pages. You'll find that with savecore -L you can create a dump of a live system, so if you don't have a crashed system around to play with, use that. Alternatively, you can use reboot -d to dump a core and reboot.

At this point we'll assume you have a dump available. By default you'll find them in /var/crash/hostname/, you'll have dumps in pairs: vmcore.0 and unix.0. We feed these two files to mdb, the (-k, kernel) Modular DeBugger, to preform our analysis like so:

# mdb -k unix.0 vmcore.0 
Loading modules: [ unix krtld genunix specfs dtrace cpu.AuthenticAMD.15 uppc pcplusmp ufs ip sctp usba lofs zfs random ipc md fcip fctl fcp crypto logindmux ptm nfs ]
>

You are now free to move about the dump. mdb commands are strange and unusual at first, it takes a lot of time to get comfortable with it, but there are a couple of debugger commands that can give us the essence of what we need. Lets walk through them.

The ::status command will display high level information regarding this debugging session. Of usefulness here is the dumps "panic message" and OS release.

> ::status
debugging crash dump vmcore.0 (64-bit) from hostname
operating system: 5.11 snv_43 (i86pc)
panic message: BAD TRAP: type=e (#pf Page fault) rp=fffffe80000ad3d0 addr=0 occurred in module "unix" due to a NULL pointer dereference
dump content: kernel pages only

The ::stack command will prove you with a stack trace, this is the same thing trace you would have seen in syslog or the console.

> ::stack
atomic_add_32()
nfs_async_inactive+0x55(fffffe820d128b80, 0, ffffffffeff0ebcb)
nfs3_inactive+0x38b(fffffe820d128b80, 0)
fop_inactive+0x93(fffffe820d128b80, 0)
vn_rele+0x66(fffffe820d128b80)
snf_smap_desbfree+0x78(fffffe8185e2ff60)
dblk_lastfree_desb+0x25(fffffe817a30f8c0, ffffffffac1d7cc0)
dblk_decref+0x6b(fffffe817a30f8c0, ffffffffac1d7cc0)
freeb+0x89(fffffe817a30f8c0)
tcp_rput_data+0x215f(ffffffffb4af7140, fffffe812085d780, ffffffff993c3c00)
squeue_enter_chain+0x129(ffffffff993c3c00, fffffe812085d780, fffffe812085d780, 1, 1)
ip_input+0x810(ffffffffa23eec68, ffffffffaeab8040, fffffe812085d780, e)
i_dls_link_ether_rx_promisc+0x266(ffffffff9a4c35f8, ffffffffaeab8040, fffffe812085d780)
mac_rx+0x7a(ffffffffa2345c40, ffffffffaeab8040, fffffe812085d780)
e1000g_intr+0xf6(ffffffff9a4b2000)
av_dispatch_autovect+0x83(1a)
intr_thread+0x50()

The ::msgbuf command will output the message buffer at the time of crash; the message buffer is most commonly used by sysadmins through the "dmesg" command.

> ::msgbuf
MESSAGE                                                               
....
WARNING: IP: Hardware address '00:14:4f:xxxxxxx' trying to be our address xxxx
WARNING: IP: Hardware address '00:14:4f:xxxx' trying to be our address xxxx

panic[cpu0]/thread=fffffe80000adc80: 
BAD TRAP: type=e (#pf Page fault) rp=fffffe80000ad3d0 addr=0 occurred in module "unix" due to a NULL pointer dereference

sched: 
#pf Page fault
Bad kernel fault at addr=0x0
.... blah blah, snipped for brevity.

The ::panicinfo command will give you lots of fun cryptic counter information, of most interest is the first 3 lines, which contain the CPU on which the panic occured, the running thread, and the panic message. You'll notice these are commonly repeated and the most useful pieces of information.

> ::panicinfo
             cpu                0
          thread fffffe80000adc80
         message BAD TRAP: type=e (#pf Page fault) rp=fffffe80000ad3d0 addr=0 occurred in module "unix" due to a NULL pointer dereference
             rdi                0
             rsi                1
             rdx fffffe80000adc80
             rcx                0
              r8                0
              r9 fffffe80dba125c0
             rax                0
             rbx fffffe8153a36040
             rbp fffffe80000ad4e0
             r10              3e0
             r10              3e0
             r11 ffffffffaeab8040
             r12 ffffffffb7b4cac0
             r13                0
             r14 fffffe820d128b80
             r15 ffffffffeff0ebcb
          fsbase ffffffff80000000
          gsbase fffffffffbc27850
              ds               43
              es               43
              fs                0
              gs              1c3
          trapno                e
             err                2
             rip fffffffffb838680
              cs               28
          rflags            10246
             rsp fffffe80000ad4c8
              ss                0
          gdt_hi                0
          gdt_lo         defacedd
          idt_hi                0
          idt_lo         80300fff
             ldt                0
            task               60
             cr0         80050033
             cr2                0
             cr3        10821b000

In my opinion, the koolest command is ::cpuinfo -v. Truth be told, if you run multiple applications on a server the most common question people (especially managers) want answered is "which application did it?", being translated into geek-esse "who do I blame?" This command will help you determine that by displaying, complete with beautiful ASCII art, the threads and process names running on each CPU (NRUN). In the following example, we know the event occured on CPU 0, thus thats the one we want to look at. Note that the "sched" process should be interpreted as "kernel".

>  ::cpuinfo -v
 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
  0 fffffffffbc2f370  1b    1    0 165   no    no t-1    fffffe80000adc80 sched
                       |    |    |
            RUNNING --+    |    +--> PIL THREAD
              READY         |           6 fffffe80000adc80
             EXISTS         |           - fffffe80daab6a20 ruby
             ENABLE         |
                            +-->  PRI THREAD           PROC
                                   99 fffffe8000b88c80 sched

 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
  1 ffffffff983b3800  1f    1    0  59  yes    no t-0    fffffe80daac2f20 smtpd
                       |    |
            RUNNING --+    +-->  PRI THREAD           PROC
              READY                99 fffffe8000bacc80 sched
           QUIESCED
             EXISTS
             ENABLE

 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
  2 ffffffff9967a800  1f    2    0  -1   no    no t-0    fffffe8000443c80
 (idle)
                       |    |
            RUNNING --+    +-->  PRI THREAD           PROC
              READY                99 fffffe8000b82c80 sched
           QUIESCED                60 fffffe80018f8c80 sched
             EXISTS
             ENABLE

 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
  3 ffffffff9967a000  1f    1    0  -1   no    no t-0    fffffe8000535c80
 (idle)
                       |    |
            RUNNING --+    +-->  PRI THREAD           PROC
              READY                60 fffffe8000335c80 zsched
           QUIESCED
             EXISTS
             ENABLE

The ::ps command allows us to see all running processes. Several flags are supported, including -z to display Zone ID's.

> ::ps -z
S    PID   PPID   PGID    SID  ZONE    UID      FLAGS             ADDR NAME
R      0      0      0      0     0      0 0x00000001 fffffffffbc25900 sched
R      3      0      0      0     0      0 0x00020001 ffffffff9970d928 fsflush
R      2      0      0      0     0      0 0x00020001 ffffffff9970e558 pageout
R      1      0      0      0     0      0 0x42004000 ffffffff9970f188 init
R  20534      1  20533  20533    24   1006 0x42010400 ffffffffb246f9b8 ruby
R  20532      1  20531  20531    24   1006 0x42010400 fffffe8109674308 ruby
R  20529      1  20528  20528    24   1006 0x42010400 fffffe80dc5602f0 ruby
...

We can use ::pgrep to search for processes and use the appropriate address for further digging. In the following example I'll find a Java process and then determine which zone that process was running in:

> ::pgrep java
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R   3628      1   3620   3574      0 0x42004400 fffffe80deeb3240 java
> fffffe80deeb3240::print proc_t p_zone->zone_name
p_zone->zone_name = 0xffffffffae0cef00 "testzone03"

There are many more tools and way to dig into your dumps using mdb. It can be confusing because you need to reference things by address, but you get more comfortable with it as you play around. If you are interested in learning more I highly recommend reading Eric Lowe's "Examining the Anatomy of a Process", which digs into the topic of process examination via mdb.

One thing you'll notice in all this is that the messages at the time of crash on the console or in syslog contain almost everything you need to know without digging too deeply. Therefore, assuming you have those messages, the most useful thing most people will extract from the core files is the output of the ::cpuinfo command to see what process was on the offending CPU at the time of the crash. Knowing what processes, zones, etc, were running at the time of crash are interesting but rarely mean much if they weren't directly involved in the panic.

As I said, once you start getting into referencing memory addresses to deepen your analysis things get sticky and tricky very very quickly... thats where Solaris CAT comes in, which we'll talk about in part 2.

"Planet Sysadmin"Steve Gillmor interviews googlers on Google Chrome

"Planet Sysadmin"Links for 2008-09-05 [del.icio.us]

"Planet Sysadmin"Why negative DNS caching is necessary

Why negative DNS caching is necessary

DNS software in general has two forms of caching, which I've seen called 'positive' and 'negative'. Positive entries hold actual answers obtained from authoritative servers (theoretically, see Dan Kaminsky's DNS attack), while negative entries mark entries that (theoretically) don't exist. Positive entries are cached for their TTL value; negative entries don't have a TTL themselves, but more or less inherit a TTL from the zone's SOA record.

(The details are complicated.)

Negative caching matters because it creates yet another block on rapidly updating your zone. Even if you control all of the primary and secondary nameservers and can update them on command, you may need to wait the negative cache TTL duration before you can be sure that everyone can see a newly created DNS name. (This is most likely to happen if somehow the name has accidentally been published before you've created it, so that people have started doing queries for it.)

One might reasonably ask why negative caching is important. The short answer is 'domain search paths'; many systems (okay, at least many Unix systems) can be configured so that they look up simple hostnames in more than one DNS domain. The existence of search paths means that you can make a lot of queries for names that don't exist, as you look up the hostname in each of your search domains until you finally find the one it's in (or you fall off the end and do a rooted DNS query).

(Negative caching is also important when you're using a DNS blocklist, because hopefully most of your queries are for things that aren't listed.)

"The Reid Report"Sarah gets her neocon training wheels

The WaPo reveals that Sarah Palin is being tutored in foreign policy by the same coterie of neocons who brought us the Iraq War:

ST. PAUL, Minn., Sept. 4 -- Sen. Joseph I. Lieberman is among several national security experts helping brief Republican vice presidential nominee Sarah Palin on foreign policy issues as she prepares to hit the campaign trail while cramming for a debate with her Democratic opponent, Sen. Joseph R. Biden Jr. (Del.), in less than a month, according to officials from Sen. John McCain's campaign.

Lieberman, who was the 2000 Democratic vice presidential nominee but is now an independent, has helped introduce Palin to officials of the American Israel Public Affairs Committee, the leading pro-Israel lobby. In a meeting Tuesday, the day before she delivered her prime-time address at the Republican National Convention here, Palin assured the group of her strong support for Israel, of her desire to see the United States move its embassy from Tel Aviv to Jerusalem and of her opposition to Iran's aspirations to become a nuclear power, according to sources familiar with the meeting. ...
So who else is on the team?

The McCain campaign has tapped Stephen E. Biegun, the national security adviser to then-Senate Majority Leader Bill Frist (R-Tenn.), to be Palin's principal foreign policy adviser. Campaign aides said Biegun, who is currently a vice president of Ford, is not serving as Palin's tutor but is merely briefing her on details of key issues in a way that is similar to what other candidates are receiving.

"The attempt is not to turn her into a professor of foreign policy but trying to get her up to speed on all the nuances of foreign policy issues that are hot and John's positions," said John Lehman, a former Navy secretary who is one of McCain's advisers. "She's surprised everybody at how current she is on Middle East issues. She doesn't pretend to be a foreign policy expert, but neither is she somebody who hasn't thought about the issues."
Oh good, she has her own Condi Rice!

And...
Bushies Come to Palin's Aid
Michael Isikoff
By Michael Isikoff

The McCain team has hastily assembled a team of former Bush White House aides to tutor the vice-presidential candidate, Alaska Gov. Sarah Palin, on foreign-policy issues, to write her speeches and to begin preparing her for her all-important Oct. 2 debate against Sen. Joe Biden.

Steve Biegun, who once served as the No. 3 National Security Council official under Condoleezza Rice at the White House, has been hired as chief foreign-policy adviser to the Alaska governor, campaign officials told NEWSWEEK. After taking leave from his job as vice president for international affairs at Ford Motor Co. last Friday, Biegun flew to St. Paul and, together with McCain’s foreign-policy guru Randy Schuenemann, began briefings for Palin on national-security issues—an area where her resume is conspicuously thin.

Plus...
Matt Scully, a former Bush White House speechwriter who helped draft some of the major foreign-policy addresses during the president’s first term, is working on Palin’s acceptance speech to the convention Wednesday night.

Mark Wallace, a former lawyer for the Bush 2000 campaign who served in a variety of administration jobs including chief counsel at the Federal Emergency Management Agency and deputy ambassador to the United Nations, has been put in charge of “prep” for the debate against Biden.

Wallace’s wife, Nicolle Wallace, the former White House communications director, has taken over the same job for Palin.

Tucker Eskew, another senior Bush White House communications aide, is serving as senior counselor to Palin’s operation.

Douglas Holtz-Eakin, the former chief economist at the Council of Economic Advisers who has been serving as top economics guru for the McCain campaign, has moved over to serve as Palin’s chief domestic-policy adviser.

Wow.

Maybe once she's fully indoctrinated, even Charles Krauthammer will learn to like her ... maybe.


"365 Tomorrows"Space Muffin

Author : Nik Gregory

The mess hall bustled around Harris; it was like a flock of vultures who had just found an overturned meat truck. Possession yields not only extended onto property but onto food too, woe betide anyone who gets the last muffin.

“All I’m saying is there’s something therapeutic about blowing up an asteroid,” stated Harris, feeling his point needed no justification.

“Spreading atomic waste throughout the entire cosmos is not what I call a therapeutic activity,” retorted Mila. She came from one of the nameless countries affected by the mass crawl into nuclear arms – it wasn’t nameless, just no one knew how to pronounce it except for Mila.

“Honey, we take the green pills for the bio’s, yellow ones for the chems, blue ones for the millisieverts and the red ones for the gammas,” said Hank; he sat scratching his sun burnt nose with the end of his spoon. “So I call bull on that.”

She conceded defeat and flickered a smile of someone half her age, “Well on that, we just got twenty moles and five scarabs in a courier this morning.”

“Twenty moles?” asked Hank.

“Yeah.”

“Shit, what do they expect us to blow up with that?”

Harris hit his head against the table, “We’re supposed to mine them, after all we are miners.”

“But how else are we supposed to split an asteroid down the fault lines? You can’t stick a prybar between two faults of nickel and push when they’re a million metric tonnes.” Hank pulled a cigar out of his breast pocket and tapped it on the table. “So Mila, what are you doing this evening?”

“I have a date with Guy Mitchells,” came her answer with an extra coy smile on the side.

“Oh, sorry,” said Harris in a mocking tone. “Are all the Walkers taken now?”

“I sure as fuck ain’t,” muttered Hank before sticking the cigar in his mouth.

“No, just they come from a small genetic pool.” She gestured toward Ed and Ted, a pair of non-related identical twins – their genetic line had stayed separate for over two millennia yet they ended up with identical fashion, beards and even the same scar gouged over their right eye.

“Okay that’s a valid point.”

“Hell yeah it is, we Walkers ain’t exactly a pretty bunch,” stated Hank to a puff of smoke, his stubbly chin seemingly more prominent through the haze.

“That’s why I picked a land lover.” She looked down the line to see Guy approach, his shoulders slenderer than hers and every other Walker.

He leant over, kissed her gently on the cheek and grabbed her muffin, “Thanks babe!”

Harris muttered, “Noob,” along with Hank.

“Oh, ‘hon’, one sec,” started Mila. She right hooked Guy, sending him toppling to the coarse regolith based concrete as she swiped back her muffin.

Mila’s attention drifted to the two guys and she said clemently, “What, it was the last one!”

Discuss the Future: The 365 Tomorrows Forums
The 365 Tomorrows Free Podcast: Voices of Tomorrow
This is your future: Submit your stories to 365 Tomorrows

September 05, 2008

"Planet Linux Australia"Arjen Lentz: Kristy Bennett's courses on marketing (including yourself)

Kristy Bennett is well known in OSS circles in Australia as a creative soul, as well as a serial (and concurrent) entrepreneur. Among many other things, she runs MIB Business Solutions providing a range of management and marketing services. This is usually done in-house, but every once in a while she offers public training days. The next upcoming ones are in Melbourne, in the evening of September 30th, and October 1st. A rare opportunity!

The evening session is Selling Yourself: Presenting with Confidence.
Whether you are looking for a new client, business partner, interviewing potential staff or seeking a new employer the art of 'selling yourself', either as an individual or as a representation of your business, product or service, is critical to finding what you are seeking. Moving away from typical topics of marketing channels, sales and branding, what to wear and even writing and reviewing applications this session is solely focused on you and how your sell yourself into a role.

With a very practical, hands on approach you will be working through finding the words to say, the manner to say them and they physical aspects of meeting with your potential 'purchaser'. This workshop is geared for people with an technology and like professional background and will look specific at presentation manner.
The next day has two independent parts (that can be booked together also):
Quick Start Marketing: Starting your Marketing from Scratch and Leaving with a Strategic Plan
and
Getting Linked In: How to find your Marketing Match

Many people are weary of business/marketing training, but with the right person teaching it's very valuable stuff. Take a peek. It's pretty cheap already, and there's an earlybird discount for bookings before September 12th, and you also get some additional goodies including a good book and a bottle of organic wine ;-)

Note: Open Query is providing the logistics for Kristy, similar to us organising Sebastian Bergmann teaching his "Quality Assurance in PHP Projects" workshop. Is Open Query going in a different direction with these new things? I don't think so, in fact I reckon it makes perfect sense: OSS has a strong but generally neglected business and marketing component.
For code, there's the great OSS community. For your business, there's OSIA but it's also good to get some clueful professional insights through training. I don't partner with just anybody, in fact I'm very picky. Training should not just be by a competent trainer with reasonable materials, it should be done by an expert in the field. In my opinion, the trainer is what makes training valuable. If you ask a tangential question during a