Dec 15

Who Thinks the Relaunched Performance Metrics are Acceptable?

HealthcareTechnical Evaluation of the Relaunched Web Site

On December 1, a updated version of was deployed which offers considerable improvements over the original October 1 launch. The administration and contractors issued some press releases and the general public and press just accepted it without really understanding the technical issues. Here’s my technical assessment of the published statements.

My Assessment of, Version 1, on October 1

As I described in my original blog post about, the site on October 1 was a technical disaster. I received a lot of criticism with my original assessment from those who thought I had a political agenda against ACA or people who simply wished the site was functional independent of the facts.

My assessment on October 1 was eventually vindicated. It took a few weeks for the media, general public, and administration to recognize that the issues were far more problematic than the politically attractive excuse of having too many users.

Will the Contractors Ever be Held Accountable?

The contractors who built the system didn’t seem to know what they were doing and didn’t prioritize the need to build a functional system. I wrote a blog post summarizing how these large IT government contractors often abuse taxpayers: Too Big to Fire: How Government Contractors on Maximize Profits

Unfortunately, the contractors who delivered the flawed system on October 1 were rewarded with additional contracts and funds to fix the mess they created. Our federal government procurement process actually gives them more money from their failure than if they did a good job. It’s no wonder that large IT government contractors continue to deliver technically mediocre results. As long as they make sure their lawyers are more powerful than the government lawyers, they can deflect ALL blame so they can continue to use their “unblemished” past performance to go after new contracts. We will see if any of the contractors here are held accountable for this fiasco when they seek future business. This contractor behavior extends across all branches of government.

It would be amazing to interview the developers who actually worked on the original project, discover what their prior experience was, what they were being paid, and how much the taxpayers were billed for their “expertise”. The contractors are enforcing confidentiality rules to prevent those people from talking to the press in the “interest” of protecting taxpayers. I thnk it’s pretty clear which interests they’re trying to protect.

Enhancements in Version 2

When the administration recognized the technical disaster, they brought in Jeff Zients to lead the disaster relief team. It’s a small world. Mr. Zients and I actually worked at the same firm (SPA/Mercer) before I started FMS, though I left a few years before he joined. Through his leadership, he added some experienced people and reorganized the team while using the same contractors. They issued a Progress and Performance Report which summarized their work:

  • System Stability: Uptime consistently above 90%
  • Reduced Error Rates: per page system time outs or failures from 6+% to 0.75%
  • System Capacity: 50,000 simultaneous users, 20-30 minutes per user for 800K per day
  • Software Fixes: 400+ Bugs Eliminated
  • Hardware Upgrades
  • Real-time Monitoring: Dedicated team focused on site monitoring and instant incident response
  • Improved Response Times: from 8 seconds to under 1 second
  • There was also Improved Window Shopping for users.

To a layman, these results seem adequate. To anyone familiar with commercial software development, they are far below what we or any of our clients would consider acceptable. This is not what professional software developers should deliver, nor what taxpayers should accept.

Review of Relaunch Accomplishments

I’m quite surprised others haven’t provided a technical review of the December 1 relaunch:

System Metrics: 90% up time (One Nine Availability is Awful)
Why do people think 90% availability is acceptable? Even their data showing 95% is awful for a web site. That’s not equivalent to an A in class.

90% up time means it’s down 10% or 2.4 hours per day. 95% is still down an hour. Most web sites have hosting uptime based on the number of 9’s. For instance, 3 nines means 99.9% up time. There are 8760 hours per year (365 days x 24 hours per day). A 99.9% availability means it’s down 8 hours a year. 99.99% availability is less than one hour down per year. High volume commercial web sites strive for 5 nines or less than 10 minutes of down time per year.

I have never heard of any web site or client expecting or satisfied with one 9 availability.

Error Rates Below 1% is Still Pretty Bad
99% sounds good for a class exam, but it’s not good for software. How can a production web site have a 0.75% error rate? The rate seems to be based on the number of pages which is far worse than users. If it’s based on users, with the 50,000 capacity, that’s 375 errors. But when it’s based on pages, assuming each user goes through 50 pages, 18,750 of their 2.5 million pages fail. That means 37.5% of users crash (18,750 divided by 50,000).

Of more concern is the cause of the errors. Software either works or it doesn’t. It doesn’t randomly fail. Is the platform failing 0.75% of the time without knowing why? That would be disturbing and could indicate lots of different bugs. If the contractors don’t know what’s causing the crashes in their buggy code, that raises very serious security implications.

Or do they know if people perform certain tasks that the system will always crash, and they expect people to do that only 0.75% of the time? Still not good, but better.

Beyond crashing bugs, the site may run without crashing but fail to perform properly such as the problems submitting accurate data to the insurance companies. Those non-crashing failures aren’t even part of this error rate which is already too high for a production system.

Capacity of only 50,000?
This is a very strange metric. One usually measures website traffic based on number of page views or transactions. The number of users can be supported by adding more bandwidth and instances of the application on more servers. The capacity issues comes from what people are doing. If they are browsing static pages (not entering data), the number of simultaneous users should be much larger. Even if they are entering data, the capacity to save the data should be much higher than 50,000.

It’s not clear what is causing the 50,000 bottleneck. It shouldn’t be the front-end web application. That should be designed to efficiently save user inputs. The users aren’t entering a lot of information in the grand scheme of data entry systems.

A well designed application would separate the real-time user experience from the more capacity constrained data lookup requirements that may have bottlenecks caused by slow legacy systems at the IRS, HHS, INS, etc. This simply means that the user would enter their information quickly, the system would process it offline, and an email would notify them when the verifications were complete.

Capacity Limitations are Odd
The web site begs the use of a commercial cloud provider that can automatically support the fluctuating volumes of users. A web site needs to accommodate the largest number of users, not the average. The large volume spike is ahead of us on the deadline date December 23rd. Volumes would drop considerably after that. By using a commercial cloud provider like Microsoft Azure or Amazon EC2, there would be no need to buy hardware to accommodate huge spikes in users or unnecessary after peak times.

We suspect it’s more profitable for the contractors to buy the extra hardware and configure it poorly than to use commercial cloud providers who would provide a better service for lower costs and profits. The contractors may have also implemented features that for “security reasons”, prevent the use of a commercial cloud provider. It could have justified the creation of their own private system even though it probably decreased security given the crashes they’ve experienced.

Software Fixes and Test Plan
Fixing over 400 bugs is obviously a very good thing, but is that enough? How did so many bugs slip through a Test Plan? And what critical bugs remain that they decided not to fix?

  • What was the test plan before October 1?
  • Were the tests conducted and what bugs were known before October 1?
  • How did they decide to release with those bugs?
  • How many bugs were found after October 1 and how were they identified?
  • Is the current Test Plan adequate?
  • What bugs were allowed in the relaunched version?
  • How are known and new bugs being handled?

Software development never reaches perfection but a good test plan covers the expected extremes to ensure the features work, unexpected errors are gracefully trapped, the system is scalable to support the expected number of users, and the site is secure.

In our experience, buggy software inevitably creates and reveals more bugs as bugs are addressed. Known problems with transmitting data to the insurance companies were already acknowledged. This implies this final step of the process was poorly tested, probably because all the preceding steps were failing. This would indicate many unknown issues that still need to be found and fixed.

If the original developers didn’t know what they were doing, trying to fix their work could be a waste of time. An experienced development team may be able to create a better solution in less time than fixing shoddy design and code from unqualified personnel.

Hardware: Do they Have Development, Testing and Staging Platforms?
The only reason I can see for such low availability is the lack of proper development, testing and staging environments. When we create web sites, our software developers need their own hardware to create and test their work without disrupting the production system. Testers need a separate platform to do their work and report back to the developers about the problems they encounter. And a staging site is necessary to review what’s about to be deployed. When the decision is made to release the new version, a switch can be made to make the staging site the new production one. In a modern host, the switch can be done almost instantaneously. Maybe it’s down for a short period to verify the new site is working, but it’s not down for extensive testing because the testing and staging environments already handle that.

Based on the information before the October 1 debut, it was clear that the standard software environment of development, testing, staging and production did not exist. How the managers of the project could have neglected this fundamental part of software development is beyond me, especially for the amount of money spent to build this site.

Without the proper platforms, it indicates the people didn’t even consider how they’d enhance and maintain the system over time, and further supports my contention that the people who created and managed this website had never been paid to build commercial database web sites before.

It really is software malpractice to not have the proper development, testing, staging and production platforms in place. The contractors should be liable for such neglect and reimburse the taxpayers.

Why Wasn’t the Site Redesigned for Simplicity, Performance, Scalability and Security?
There were many opportunities to redesign the site to make it more consumer friendly, reduce the amount of development and testing resources, support more users, and improve security. I list these missed opportunities based on what I have seen:

  • The account creation page should be one screen not three. We create multiple pages if entries in one screen impact the following screens. For the site, that’s not the case. For instance, if on the first page, you enter an email address that already exists in the system, you’re not told it’s invalid until you finish the third page and are forced to restart. That just adds load on the system. There’s also no need to create a different user name. Why not just use the email address? Most web sites have a one page account creation page, but we understand how having more pages is more profitable to the contractor.
  • The See Plans feature is a huge improvement for shopping. However, when someone finds and wants to buy a plan without a subsidy, there isn’t a way to do so without creating an account in the system. The site should simply direct the customer to the insurance company since the government is not involved with providing a subsidy. In addition to improving the customer experience, that would reduce the load on the web site so they can serve more customers. Get them off the site as quickly as possible!
  • There’s no need to ask for information that isn’t directly tied to calculating the subsidy. The “nice to have” questions on race can be discarded to improve response time, reduce the time it takes users to fill out the application form, and increase the number of users the site can support. It also increases capacity.


Over the years, we’ve helped lots of organizations design their software solutions, select technologies, specify architectures, and deliver solutions that are reliable, scalable, secure and maintainable. So much of the site seems to remain quite fragile.

I don’t mean to slam the many people worked hard to salvage the awful work of the initial developers. I’m sure they didn’t get to spend much time with their families over Thanksgiving. The relaunched site is definitely much better than the original version. But it only looks good when compared to that technical disaster. Can anyone claim the new metrics are acceptable for an enterprise quality, nationwide public site as important as this?

For more information, read my earlier blog post Too Big to Fire: How Government Contractors on Maximize Profits, and a newer post Designing a Data Entry System Properly; Overhauling the Web Site.

Dec 07

Too Big to Fire: How Government Contractors on Maximize Profits

Healthcare.govHow Could the Federal Government Spend So Much and Get So Little?

The government contractors in the project continue to make fortunes after delivering a technical disaster. Unfortunately, this is common among IT projects delivered by large government contractors. Each year the government spends billions for poorly designed or non-functional systems that never even get deployed.

When I wrote my original blog post about, I thought the web site was created by incompetent people. Now I believe that in addition to being incompetent or inexperienced, decisions were made to maximize contractor profits.

This is philosophically different from the way we think at FMS. We always want to deliver functional systems on-budget and on-time. We take pride in creating solutions that don’t require additional work to fix them. As a small firm, we’re held accountable for our deliverables. If we fail, we would never be invited back. For large government IT contractors, it’s a totally different world.

Blame Others

Over the years, many large IT government contractors have abused taxpayers so often that they forgot the public would actually use and judge their performance. Even now, with their legions of lawyers and media spin, they are deflecting the story to blame government officials, other contractors, etc. without taking any responsibility. I agree there were problems with other parties, but that’s in addition to their own behavior.

Charge Extra for What Should be Included

Government contractors are experts at adding change orders and generating more revenues for features that should be already included. For instance, we are now hearing about problems with data security for the site. Security should be implemented from the beginning. Anyone with any experience collecting personal information such as social security numbers and birthdays knows that. However, the government contractors who won these contracts based on “past performance” suddenly suggest others are to blame for not specifying it earlier. That’s like buying a car and discovering brakes were an add-on. No, it should be included without asking for it. Storing data requires doing it securely.

Too Big to Fire

Given the awful work delivered on October 1, there’s no chance that the same team can be trusted to deliver a functional system. They already showed the world what they considered shipping quality, yet they remain. Unfortunately, our existing procurement system keeps these large government contractors because they are simply Too Big to Fire.

No Accountability for Large Technology Contractors

A small government contractor that performed so badly would not be allowed back into these agencies. The large ones can deflect the blame and legally challenge any attempt to hold them accountable. They never issue refunds, and in fact, profit from their mistakes with awards of new contracts and change orders. They are effectively not held accountable for their awful past performance, so the disasters repeat themselves whether it’s at HHS, FBI, Air Force, IRS, etc. The federal government is littered with expensive projects that were never used or functional, but highly profitable for the contractors.

Our Government Contracting System Encourages This

What happened with is exactly what our system encourages contractors to do. Had the contractors finished on time and properly, they would have made less money than delivering a flawed system. The government has tried to privatize its services by using outside contractors. Unfortunately, these government contractors are specialists at getting government contracts and milking taxpayers more than their technical ability. They would never survive in the private sector.

Policy Makers Now Have Political Risk for Technology Decisions

This is the first time an administration has paid such huge political cost for mismanaging technology. Prior to this, Presidents understood they were responsible for the economy, jobs, wars, terrorism, crime, responding to natural disasters, etc. They never realized there was political risk with technology. President Reagan wasn’t blamed for the Space Shuttle exploding, but this administration has become responsible for this web site disaster.

Contractor Goals and Values Do Not Align with Policy Makers

Frankly, I don’t think the politicians have any better handle for designing rockets or web sites. They relied on contractors and these contractors misled them. The policy makers don’t realize the goals and values of the contractors differ from theirs. Watching our leaders say the self-serving things their contractors tell them is even more embarrassing. Those contractors are not your friends!

Lobbying and Post-Retirement Jobs Drive Business

Some government officials are swayed by promises of post-retirement jobs at the contractors they supervise. Look at all the former program managers, contracting officers, Congressmen, admirals and generals at these government contractors and their lobbyists to understand how business is done.

Contractors get on contracting vehicles like the IDIQ for (termed “Licenses to Hunt”), then send in their well connected people to get contracts directly or wire contracts by “helping” draft requests for proposals (RFPs) favoring their organization. Perfectly legal. Not taxpayer friendly.

Bipartisan Reform Required

This is a bipartisan issue because unless IT government contracting is reformed, this is going to bite future politicians/policymakers of both parties. They do not have the training or experience to manage these technology projects. Especially when the contractors are run by used car salesmen who say, “You should get the undercoating” and the government people are technically unqualified to say “No”.

Technology Accountability Office (TAO)

We need the creation of a Technology Accountability Office (TAO), similar to the GAO to help agencies properly manage and buy these solutions, or an agency that manages large IT projects so the Best Practices are dispersed across the agencies. Right now, politicians have no clue whether a project should cost $1 million or $100 million, and whether it can be done in 3 months or 3 years. It’s total chaos and taxpayers are paying the tab.

Related Resources

Here are a few opportunities where I’ve spoken about these issues.

fox-and-friendsNovember 30: Fox & Friends Live Interview with Clayton Morris

ObamaCare: Mistake or moneymaker?

A one-on-one interview with Clayton Morris for four minutes discussing how large government contractors profit from delivering systems that don’t work: “If we follow the money, we’ll see the stink in the system…Too Big to Fire”

fox-friends-2013-11-30-clayton-luke-graphic fox-friends-2013-11-30-luke

Greta van SusterenNovember 26: On the Record with Kimberly Guilfoyle

Will be in good health by Nov. 30?

Greta van Susteren is on vacation, so I chatted with Kimberly who was in New York City while I was on Greta’s studio in Washington, DC.

Kimberly-Guilfoyle-Luke Kimberly-Guilfoyle-Luke2

“Over time, I’m beginning to see that these government contractors who took over this project have essentially made every decision that favors them as much as possible – to maximize the cost to taxpayers, to maximize their profits.”

Related article by Greg Richter based on the broadcast: Software Developer: ACA Website Designers Just Lining Own Pockets

House Homeland Security CommitteeNovember 13: House Homeland Security Committee Testimony

I had the opportunity to discuss the problems with government IT contractors in my prepared testimony and questions from Chairman McCaul.

Homeland Security Committee Testimony

Testifying before the House Committee on Homeland Security

nj-star-ledgerDecember 17: The Star Ledger by Paul Mulshine

“Luke Chung is the best authority I’ve come across on the Obamacare software debacle.”

How contractors got rich by screwing up Obamacare

Additional Media Coverage for Changing the National Discourse on