CC-BY-NC is out. Does Cohere's '4-bit lossless' claim hold?

Command A+: Apache 2.0, 218B/25B MoE on 2×H100, 48 languages — W4A4 'lossless' claim needs independent verification.

CC-BY-NC is out. Does Cohere's '4-bit lossless' claim hold?
Share

For two years, the catch with Cohere's open weights was a four-letter clause: -NC. On May 20, 2026, that clause disappeared — and with it, the main reason enterprise teams kept Command models in the "evaluate, never ship" pile.

Apache 2.0 Arrived — The Practical Difference From CC-BY-NC

Command A+ (model ID command-a-plus-05-2026) is Cohere's first open-weights release under Apache 2.0, an OSI-approved permissive license that allows commercial use, fine-tuning, redistribution, and self-hosting with no fees and no non-compete clause . That is a hard break from Cohere's earlier open releases — the Command R/Command A family and the Aya models — which shipped under CC-BY-NC, where the "NC" (non-commercial) term blocked commercial use, revenue-generating fine-tunes, and redistribution unless you negotiated a separate license .

Concretely, three previously-blocked things are now permitted without a licensing conversation:

  • Ship a commercial product built on the weights — no per-seat or per-deployment fee.
  • Fine-tune for revenue — adapt the model on your own data and monetize the result.
  • Redistribute — fork, repackage, or bundle the weights into your own stack.

The non-commercial clause was the single documented friction point that stalled enterprise evaluations of Cohere's open weights; removing it removes that gate . Cohere frames the change around sovereign and critical-infrastructure deployment — governments and regulated enterprises that need to run a frontier model on-premises or air-gapped, without depending on US cloud APIs . As the company puts it, the release is "built for sovereign and critical infrastructure" deployment (source: Cohere via BusinessWire, 2026-05).

What does not change is worth stating plainly. The Apache 2.0 grant covers the model weights, not Cohere's hosted API, whose commercial and rate-limit terms remain a separate agreement. And the grant's practical value depends on the weights actually being downloadable — a point where the source reporting diverges, and one we return to in the final section.

Cohere's Expert-Routing Design: 218B Stored, 25B in Each Forward Pass

CC-BY-NC is out. Does Cohere's '4-bit lossless' claim hold?

Command A+ is a sparse Mixture-of-Experts (MoE) model: it stores 218B total parameters but activates only 25B of them per token — roughly 11% utilization on each forward pass . That routing is the central design change from the prior Command A, which was a 111B dense model where every parameter fires on every token . The practical effect is that per-token compute is decoupled from total capacity — you pay for 25B active parameters, not 218B.

This is why headline parameter counts mislead. "218B" describes how much the model can store across its expert pool, not how much it computes per step. Comparing it head-to-head against a 218B dense model overstates the cost, and comparing it against a 25B dense model understates the capacity. Cohere is its first MoE release, and the sparse structure is what lets the weights spread across many specialized experts while keeping the active path small .

One trade-off cuts against migration. Command A+ offers a 128K-token input window and 64K-token maximum output , narrower than the 256K input the dense Command A shipped with . Teams running long-document RAG on Command A should confirm their context budget fits inside 128K before switching — a single large contract or filing that previously loaded whole may now need chunking.

Language and recency coverage move the other way. Command A+ supports 48 languages, including all official European Union languages — more than double the 23 of earlier Command A models . Its knowledge cutoff is April 1, 2025 .

SpecCommand A (dense)Command A+ (MoE)
Parameters111B, all active218B total / 25B active
Input context256K tokens128K tokens
Max output64K tokens
Languages2348 (all EU official)
Knowledge cutoffApril 1, 2025

B200 or 2×H100: Unpacking the 4-Bit Compression Claim

Command A+ ships in three numeric formats, and the smallest one is the reason it fits on modest hardware. Cohere distributes the weights in BF16 (full 16-bit precision), FP8 (8-bit), and W4A4 — 4-bit weights and 4-bit activations . The W4A4 build is what lets the model run on a single NVIDIA B200 or two NVIDIA H100 GPUs . Combined with the MoE sparsity covered above — 25B active parameters per token — the quantization is the second half of the footprint story.

The comparison that makes this concrete: Llama 3.1 405B, a dense model of broadly comparable capability, needs eight or more H100s at FP16 . Sparse routing plus W4A4 is how Cohere gets to a two-GPU deployment instead of a cluster. Cohere also claims up to 110% higher throughput and 30% lower latency than Command A Reasoning .

The contested word is "lossless." Cohere states W4A4 produces virtually no perplexity degradation — a strong assertion, because quantizing activations to 4 bits typically costs more quality than weight-only 4-bit schemes like GPTQ or AWQ. Activations carry dynamic, outlier-heavy values that resist aggressive rounding, so a clean W4A4 result would be a genuine engineering claim, not a routine one.

It is also one you cannot yet check. The recipe — kernel design, calibration data, rounding method — has not been published.

"The W4A4 quantization method has not been fully published, which limits reproducibility, so the 'lossless' claim remains vendor-attestable rather than independently reproduced." — analysis by VentureBeat

For a developer, the practical read is: download the W4A4 weights, run your own perplexity and task evals against the BF16 build, and treat the lossless framing as a hypothesis to test on your workload — not a settled result.

τ²-Bench 85%, Terminal-Hard 25%: Where Independent Checks Are Absent

CC-BY-NC is out. Does Cohere's '4-bit lossless' claim hold?

Every benchmark Cohere published for Command A+ is self-reported, and none had an independent leaderboard position at the time of writing. Cohere's figures show τ²-Bench Telecom at 85% versus 37% for Command A Reasoning, Terminal-Bench Hard agentic coding at 25% versus 3%, MMMU Pro at 63%, and MathVista at 80.6% . Read these as vendor attestations, not settled rankings.

The agentic coding jump is the one to scrutinize. Going from 3% to 25% on Terminal-Bench Hard is the largest relative gain in the set, and it is also the metric with the fewest external cross-checks currently available . A 22-point absolute move on a hard agentic benchmark is plausible for an architecture shift, but it is exactly the kind of result that benefits most from third-party replication on held-out task sets.

BenchmarkCommand A+ (self-reported)Command A ReasoningIndependent check
τ²-Bench Telecom85%37%None at time of writing
Terminal-Bench Hard (agentic coding)25%3%None at time of writing
MMMU Pro63%None at time of writing
MathVista80.6%None at time of writing
Artificial Analysis Intelligence Index37Appears vendor-submitted, not independently collected

The Artificial Analysis Intelligence Index score of 37 deserves a footnote of its own. It is listed among Cohere's numbers, but the value appears to be Cohere's own submission rather than an independently collected third-party rating . That distinction matters: a vendor-run index entry carries less weight than a rating an external lab measures under its own harness.

For a developer deciding whether to trust these figures, the practical move is to wait for outside signal. Watch Artificial Analysis, LMSYS Chatbot Arena, and livebench.ai over the coming weeks; if Command A+ lands near its claimed numbers on those, the self-reported set gains credibility. Until then, run the agentic and reasoning evals that match your workload rather than adopting the blog figures as given.

The Priority Audience: Sovereign AI, Air-Gap Environments, and Multilingual Stacks

The clearest beneficiaries of Command A+ are organizations that cannot route inference through a US cloud API: national governments, defense and intelligence stacks, and regulated enterprises that need a frontier model running on-premises or fully air-gapped. Cohere positions the model explicitly for this "sovereign" and critical-infrastructure use case, and the Apache 2.0 license is what makes it viable — self-hosting, fine-tuning, and shipping commercial products on the weights all become legal without a non-compete clause or per-seat licensing negotiation .

The second draw is auditability. Command A+ models retrieval and generation jointly, so factual claims map back to source documents at generation time rather than being stitched on afterward by a separate citation pass . For legal, medical, and financial RAG — where an unsourced assertion is a liability, not just a quality issue — native grounding is the feature that justifies the evaluation effort.

"Native citation grounding aims at auditable RAG in legal, medical, and financial workflows, where retrieval and generation are jointly modeled so factual claims map to source documents," per Cohere's release framing (source: Cohere).

Third is language breadth. Command A+ covers 48 languages including every official EU language — more than double the 23 of earlier Command A models . That makes it the first Cohere model with enough multilingual parity to be a serious candidate for EU compliance scenarios that demand consistent behavior across member-state languages, not just English plus a handful of majors.

Finally, packaging matters for builders. The single ID command-a-plus-05-2026 consolidates vision, reasoning, multilingual, tool use, and citation capabilities that previously shipped as separate Command A variants . One endpoint, one weight set to host — meaningfully less integration surface for a team standing up an air-gapped agent than juggling four specialized checkpoints.

What the Distribution Gaps Mean If You're Not on Azure

CC-BY-NC is out. Does Cohere's '4-bit lossless' claim hold?

Where you can actually call Command A+ is narrower than the license suggests. At launch, only two managed paths are confirmed: Cohere's own API across all tiers, and Microsoft Azure AI Foundry . Cohere's model overview lists command-a-plus-05-2026 as N/A on Amazon Bedrock, Amazon SageMaker, and Oracle OCI, with no published timeline for adding them . For an AWS-primary or OCI-primary stack, the Apache 2.0 headline does not translate into a one-line endpoint swap.

The hosted-access terms tighten the picture further. Trial keys are capped at 20 requests per minute and 1,000 API calls per month, and production pricing is not published as a per-token rate — the docs route production usage to Model Vault or private deployment via sales@cohere.com rather than a posted figure . So even the managed route asks for a sales conversation before any serious volume.

If you are not on Azure and cannot self-host, that leaves two contingencies. One: wait for a Bedrock, SageMaker, or OCI listing that Cohere has not committed to. Two: provision your own cluster — viable because the model fits on a single B200 or two H100s , but contingent on the weights being publicly downloadable, which remains unsettled. Teams that need a commercial deployment this quarter should treat the Cohere API or Azure as the default and scope a private-cluster fallback before betting a roadmap on a Bedrock date that does not yet exist.

The W4A4 Opacity Problem: Apache 2.0 Declared, Independent Reproduction Absent

Apache 2.0 was declared for Command A+ on May 20, 2026, but a license grant is not the same as a downloadable artifact. As of writing, no Hugging Face Hub listing for the command-a-plus-05-2026 checkpoints has been independently confirmed, and the two release briefs themselves diverge on whether public weights were actually posted . "Apache 2.0 license on an API-served model" permits commercial use in principle; "weights you can pull and run air-gapped today" is the thing sovereign-AI and air-gap buyers are actually paying for. Those are not interchangeable, and the gap between them is where the open-source claim is most contested.

Two unresolved disclosure problems compound this. First, Cohere's own documentation is inconsistent: the Command A page renders the command-a-plus-05-2026 model ID while the surrounding text still describes the older 111B-parameter, 256K-context Command A that the model overview lists separately as command-a-03-2025 . That mismatch is unreconciled at the time of writing and matters for anyone scripting against a specific ID. Second, the W4A4 recipe — calibration set, activation-quantization scheme, and inference kernel — is not fully published, so the "lossless" 4-bit claim that underpins single-B200 and two-H100 deployment remains vendor-attestable rather than independently reproduced .

As VentureBeat's analysis frames it, "the W4A4 quantization method has not been fully published, limiting reproducibility," which leaves the lossless framing as an attestation rather than a verified result (source: VentureBeat, 2026-05).

The concrete takeaway: treat Command A+ as a strong managed-API and Azure option you can build commercial products on now, and treat the self-hostable open-weights story as conditional. Before committing a roadmap to on-prem or air-gapped deployment, confirm three things — a published checkpoint with a verifiable Apache 2.0 license file, a reconciled model ID in Cohere's docs, and a W4A4 method you or a third party can reproduce. Until those land, the license is real but the reproduction is absent, and that distinction should set your timeline.

Last updated: 2026-06-24. Reviewed against Cohere's May 20, 2026 release materials and independent trade coverage available at the time of writing.

Frequently asked questions

What does the Apache 2.0 license change if my team is already using the Cohere API?

For API-only teams, nothing changes today. Apache 2.0 covers the model weights, while Cohere's hosted API operates under separate, unchanged terms of service. The practical shift is optional future leverage: the prior CC-BY-NC non-commercial restriction is gone , so if you later want to self-host Command A+ and ship a commercial product on it, the commercial barrier no longer exists. Until you self-host, your current API usage and billing are unaffected.

Can Command A+ run on a single NVIDIA H100?

No. The minimum documented configuration is two NVIDIA H100 80GB GPUs (at W4A4 or FP8) or a single NVIDIA B200 . A single H100 is not a listed deployment target. This is still well below the 8+ H100s typically required for dense models of comparable capability such as Llama 3.1 405B , but plan capacity around the 2×H100 floor, not a single card.

Are Command A+ weights available to download on Hugging Face?

As of May 2026, this is unconfirmed. Cohere declares the model under Apache 2.0, but independent confirmation that public downloadable checkpoints (for example, on Hugging Face) were actually posted has not been reported . A permissive license declaration is not the same as published weights. Check Cohere's Hugging Face organization page directly and verify an Apache 2.0 license file before committing to a self-hosted setup.

How does Command A+ context length compare to Command A?

It is shorter. Command A+ offers a 128K-token input context with a 64K-token maximum output , whereas the March 2025 Command A shipped a 256K-token context window . If you run long-document RAG on Command A today, factor in the roughly halved input window before switching — workflows that depend on packing very large documents into a single call may need chunking or retrieval changes.

What is W4A4 quantization and why is Cohere's 'lossless' claim contested?

W4A4 quantizes both weights and activations to 4 bits, a more aggressive approach than weight-only 4-bit methods where quality degradation is better characterized. Cohere claims virtually no perplexity or quality loss at W4A4, the compression that enables single-B200 or 2×H100 inference . The claim is contested because the W4A4 recipe has not been fully published, which limits reproducibility — so 'lossless' remains vendor-attestable rather than independently verified .