CIO-SP4, Scooby Doo edition: the case of the hidden Excel columns
The federal IT community is having a moment right now when it comes to Governmentwide Acquisition Contracts (known by the delightful abbreviation "GWACs"). After a blockbuster decision earlier this year that affected a handful of GWACs[1], folks were on pins and needles waiting for the outcome of a whole bunch of protests of the National Institutes of Health’s IT Acquisition and Assessment Center Chief Information Officer-Solutions and Partners 4 ("CIO-SP4") GWAC.
Spoiler alert: GAO sustained those protests and now there's a bit of a do-over in the works.
I said that folks were on pins and needles but, to be honest, they've been on pins and needles related to CIO-SP4 for a long long time. As one observer noted:
If you are keeping count at home, NITAAC has now faced 188 protests over CIO-SP4 in almost two years, according to GAO’s bid protest docket. Fortunately for NITAAC, it has won most of them. But the delays in getting CIO-SP4 going and the costs to the companies and the government are main subplots to the entire saga.
Indeed, CIO-SP3 — the current generation GWAC that CIO-SP4 intends to replace — was supposed to expire in May 2022. But here we are in July 2023, and CIO-SP3 continues to be extended.[2]
A major reason there are so many protests of CIO-SP4 and it's taking so long is that so much is at stake. Because of their scope, duration, and intended scale, GWACs have somewhat absurd ceiling amounts. CIO-SP4 is no different: it is a 10-year, $50 billion contract.
For a GWAC to accommodate an estimated average of $5 billion per year, it should theoretically have a pretty sizable pool of companies. In the case of CIO-SP4, the solicitation contemplated 305 to 510 IDIQ contracts.
With that much cheese on the line, the agency (correctly) predicted that there would be a lot of companies submitting proposals for CIO-SP4. In the end, there were 1,150 proposals! That's a lot of work for the agency!
And, expecting many proposals, NIH used two strategies to limit the amount of work that it had to do: (1) use a phased-evaluation approach; and (2) require companies to "self-score" themselves.
A phased-evaluation approach is pretty staightforward: why review all of the proposals for all of the things if most of them aren't going to get awards anyway? The optimal approach is to review only as many proposals as you need to, and to structure an evaluation process with downselects and progressively increase the amount of review and documentation that's necessary. Very common, very reasonable.
The self-scoring approach, though, is a bit more unusual. The basic idea is that the agency would establish a bunch of criteria, assign potential points for each criteria and ask companies to add up all of their points and come up with a final score. It's all just simple addition, even though it involves many criteria.
From the agency's perspective, requiring vendors to score themselves is a pretty attractive administrative shortcut. If you're going to end up ranking a bunch of vendors anyway, letting them rank themselves is so much easier to manage than trying to parse through a proposal and assign points yourself. It's kind of like asking a direct report to prepare a "self-reflection" in advance of an annual performance review: it's helpful because the direct report can tell you all of the good stuff to put into the performance review writeup.
But like a self-reflection for annual performance reviews, there really needs to be a check on the self-score. If I told my boss that, in the past fiscal year, I solved world hunger through quantum computing on the blockchain, two things would be true: (1) I'd be an asshole and (2) I'd be lying. Unfortunately, some people are just lying assholes![3] And because of that, the CIO-SP4 solicitation stated that the agency would "validate" the self-score before allowing companies to advance to the next phase of competition.
Still, validation can be a lot of work! There were 1,150 offerors, and doing a thorough scrub of all 1,150 proposals would surely take time and effort. And isn't avoiding that time and effort the point of using self-scoring in the first place?
NIH apparently didn't do it:
Our review of the SSA master tracking spreadsheet shows that of the 1,150 proposals received, only 199 of the proposals (approximately 17 percent) received an adjusted score indicative of a hard validation. Of the 433 proposals that advanced to phase 2 of the competition, 152 of those proposals (approximately 35 percent) were assigned an adjusted self‑score.
Oops! I mean, if you say you're going to validate scores, you probably actually are going to want to validate them?
And although the opinion goes into some interesting depth around a "'3-filter' mathematical analysis" to establish "cutlines," the fun part of the opinion is that, instead of generating cutlines, GAO basically caught NIH cutting corners.[4]
Sure, the agency talked about using fancy math words like "differentiation" and "mode" filters. And sure, the agency made "adjustments" to the self-score. But at the end of the day, the agency apparently only analyzed "whether the submitted information supported a vendor’s allocation of its self-scored points" for fewer than 20% of the proposals.
How, you might ask, was this all discovered? That's my favorite part!
As it turns out, there were 2 "supplemental protests" filed by two companies (Karsun Solutions, LLC and Neev-KS Technologies) after the agency provided the record to the protestors. As part of those supplemental protests, the companies realized that NIH used an Excel spreadsheet to develop score cutlines and that Excel spreadsheet "contained a number of hidden columns, including column I titled 'Validated Score,' which listed scores for some, but not all offerors!"
Hidden Excel columns? Only some vendors? Suspicious!
The agency initially claimed that everything was fine: the hidden columns were just intended to do a "what if analysis" and weren't used to establish cutlines. Later, though, the agency later contradicted itself, claiming that the hidden columns were in fact used to establish the cutlines! A-ha! Busted!
After a pretty complicated explanation, GAO found that, based on those hidden columns revealed in the supplemental protests, it looks like the NIH didn't actually validate all of the proposers' self scores. And because of those supplemental protests, GAO sustained all 98 consolidated protests! That'll certainly boost the GAO's sustain rate.
Mystery solved. And the NIH might have gotten away with it, if it wasn't for those meddling protestors!
Still, I'll probably have more to say about CIO-SP4 and some funny stuff about mentor-protege joint ventures.
In the meantime, though, take a moment to wish good luck to the NIH source-selection team that now has to "hard validate" all of the 1,150 protests. It's alright, folks. Take your time. We'll wait.
[1] I know that's an awkward clause. For at least a few more months, anyway, we don't talk about Bruno. And if you don't know what I mean, let's take this one up offline, mkay?
[2] It's worth reflecting on the fact that the period of performance for CIO-SP3 started in 2012! Obama was still in his first term and the, uh, "launch" of Healthcare.gov was still over a year away. Fun times!
[3] According to at least one study, more than 5% of people are prolific liars! Wild.
[4] Sick burn, amirite?