Skip to content

Conversation

@stealthybox
Copy link
Member

@stealthybox stealthybox commented Jan 16, 2026

With this change Flux can verify signatures issued from both the
cosign v2.x and v3.x CLI's when using their respective default signing settings/flags.

v2 signatures and v3 bundle signatures both function transparently.
This does require Flux to perform additional queries to the registry.

Fixes #1923
Carries on from #1924

@stealthybox
Copy link
Member Author

stealthybox commented Jan 16, 2026

This quick script:
~/hack/cosign/setup.sh

  • starts registry:2 (OCI 1.0) and zot:latest (OCI 1.1)
  • creates a matrix of flux artifacts with unique ConfigMaps
  • uploads the artifacts to their respective registry
  • signs the artifacts using the appropriate cosign binary or binaries based on their name
  • configures signature settings by name

The script also does a quick crane ls of all the tags to show the cases are producing different tag data.
When attempting manual signature verification -- not all cases are valid configurations.

Using this script, you can prime the two local registries with testdata that functions with the next commit.

#!/usr/bin/env bash

docker run -d --restart=unless-stopped -p "127.0.0.1:5558:5000" --name cosign_zot ghcr.io/project-zot/zot:latest
docker run -d --restart=unless-stopped -p "127.0.0.1:5559:5000" --name cosign_registry registry:2
sleep 2

#cosign initialize --staging

echo | COSIGN_PASSWORD= cosign generate-key-pair

function mk_config() {
    name="$1"
    port="$2"

    mkdir -p $name
    kubectl create cm --dry-run=client -oyaml $name > $name/cm.yaml
    flux push artifact --path ./$name oci://localhost:$port/$name --source dev --revision $name

    v3_args=
    if [[ $name == *"envelope"* ]]; then
        v3_args="--new-bundle-format=false --use-signing-config=false"
    fi

    if [[ $name == *"v2"* ]]; then
        echo y | COSIGN_PASSWORD= flox activate -d ~/hack/cosign-v2 -- cosign sign --key cosign.key localhost:$port/$name
    fi
    if [[ $name == *"v3"* ]]; then
        echo y | COSIGN_PASSWORD= flox activate -d ~/hack/cosign-v3 -- cosign sign $v3_args --key cosign.key localhost:$port/$name
    fi

    crane ls localhost:$port/$name
}

mk_config v2-reg 5559
mk_config v2-zot 5558

mk_config v2-v3-envelope-reg 5559
mk_config v2-v3-envelope-zot 5558
mk_config v2-v3-bundle-reg 5559
mk_config v2-v3-bundle-zot 5558

mk_config v3-envelope-reg 5559
mk_config v3-envelope-zot 5558
mk_config v3-bundle-reg 5559
mk_config v3-bundle-zot 5558



echo
echo v2-reg
crane ls localhost:5559/v2-reg
echo
echo v2-zot
crane ls localhost:5558/v2-zot

echo
echo v2-v3-envelope-reg
crane ls localhost:5559/v2-v3-envelope-reg
echo
echo v2-v3-envelope-zot
crane ls localhost:5558/v2-v3-envelope-zot
echo
echo v2-v3-bundle-reg
crane ls localhost:5559/v2-v3-bundle-reg
echo
echo v2-v3-bundle-zot
crane ls localhost:5558/v2-v3-bundle-zot

echo
echo v3-envelope-reg
crane ls localhost:5559/v3-envelope-reg
echo
echo v3-envelope-zot
crane ls localhost:5558/v3-envelope-zot
echo
echo v3-bundle-reg
crane ls localhost:5559/v3-bundle-reg
echo
echo v3-bundle-zot
crane ls localhost:5558/v3-bundle-zot

@stealthybox
Copy link
Member Author

stealthybox commented Jan 16, 2026

WIP tests pushed in bb1c8fb.

It passes locally! 🙂
When twiddling with it, we can also produce nice signature/bundle verification error messages, so it should also be easy to produce sad-path tests.

They are not expected to pass in CI.
The public key needs to read from the setup.sh in the previous comment and the local registries need to be running with the proper artifact and signature fixtures.

I tried using Zot pkgs to create an OCI 1.1 compliant registry:

    "zotregistry.dev/zot/v2/pkg/api"
    "zotregistry.dev/zot/v2/pkg/api/config"

but they pull in a lot of undesirable dependencies, require replaces, and the real kicker is some of the deps need CGO, so I ditched the effort of adding another testregistry.NewZot()

@stealthybox
Copy link
Member Author

Looks like i missed this, need to fix:
https://github.com/fluxcd/source-controller/actions/runs/21077738289/job/60623771643#step:3:423

# [github.com/fluxcd/source-controller/internal/controller]
Error: vet: internal/controller/helmchart_controller_test.go:3477:102: not enough arguments in call to sign.SignCmd
	have (*"github.com/sigstore/cosign/v3/cmd/cosign/cli/options".RootOptions, "github.com/sigstore/cosign/v3/cmd/cosign/cli/options".KeyOpts, "github.com/sigstore/cosign/v3/cmd/cosign/cli/options".SignOptions, []string)
	want (context.Context, *"github.com/sigstore/cosign/v3/cmd/cosign/cli/options".RootOptions, "github.com/sigstore/cosign/v3/cmd/cosign/cli/options".KeyOpts, "github.com/sigstore/cosign/v3/cmd/cosign/cli/options".SignOptions, []string)
make: *** [Makefile:129: vet] Error 1
Error: Process completed with exit code 2.

Copy link
Member

@matheuscscp matheuscscp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

@stealthybox stealthybox force-pushed the cosign-v3-bundles branch 2 times, most recently from 2f282b2 to 8f083f5 Compare January 17, 2026 01:22
@stefanprodan
Copy link
Member

This is blocking fluxcd/flux2#5676 we'll need to wait for the merge before Kubernetes 1.35 upgrade.

Signed-off-by: leigh capili <leigh@null.net>
@stealthybox stealthybox force-pushed the cosign-v3-bundles branch 4 times, most recently from b1b79bc to f82d33d Compare January 24, 2026 07:08
@stealthybox stealthybox force-pushed the cosign-v3-bundles branch 2 times, most recently from 40fb266 to 169c369 Compare January 24, 2026 14:53
@stefanprodan
Copy link
Member

@stealthybox I see lots of UNAUTHORIZED: authentication required did you forgot to make the new OCI repos public?

@stealthybox
Copy link
Member Author

stealthybox commented Jan 26, 2026

did you forgot to make the new OCI repos public?

yes I forgot to expose them.
I didn't initially notice because things were working locally with my GHCR read token.

I've now made those packages public.
I did have to enable public package creation in the org settings first before making the individual packages public.

Unrelated -- for the private key signatures -- I was struggling to get CI to sign them in the final step:
https://github.com/fluxcd-testing/cosign-testing/actions/runs/21310278055/job/61345437054
Maybe there's a bug with cosign and unencrypted private keys when there is no tty.
I ended up just signing those specific artifacts with my laptop.

@stealthybox
Copy link
Member Author

Looks like I'm still missing some options to support keyless sig verification
31

      message: 'chart verification error: failed to verify oci://ghcr.io/fluxcd-testing/cosign-testing/v3/charts/podinfo:6.9.4:
        no matching attestations: a trusted root is required for identity-based verification'

@stealthybox
Copy link
Member Author

stealthybox commented Jan 26, 2026

well, I don't know why the same options aren't working from the v2 case, but at least it's easy to replicate with my older unit tests:

--- FAIL: TestPKBundleAttestationsKeyless (2.55s)
    --- FAIL: TestPKBundleAttestationsKeyless/v3_fluxcd-testing_chart (2.55s)
        /Users/stealthybox/repos/fluxcd/source-controller/internal/oci/cosign/cosign_test.go:306: 
            Unexpected error:
                <*cosign.ErrNoMatchingAttestations | 0x14000909280>: 
                no matching attestations: a trusted root is required for identity-based verification
                {
                    err: <*fmt.wrapError | 0x140000c9da0>{
                        msg: "no matching attestations: a trusted root is required for identity-based verification",

The first succeeds with our SDK usage, and the second one fails:

		{
			name:   "v2 podinfo chart",
			tagURL: "ghcr.io/stefanprodan/charts/podinfo:6.9.4",
		},
		{
			name:   "v3 fluxcd-testing chart",
			tagURL: "ghcr.io/fluxcd-testing/cosign-testing/v3/charts/podinfo:6.9.4",
		},

however, both of these cosign commands pass:

cosign verify --certificate-oidc-issuer 'https://token.actions.githubusercontent.com' --certificate-identity-regexp "^https://github\.com/stefanprodan/podinfo/\.github/workflows/release\.yml@refs/tags/.*" ghcr.io/stefanprodan/charts/podinfo:6.9.4

cosign verify --certificate-oidc-issuer 'https://token.actions.githubusercontent.com' --certificate-identity-regexp "^https://github\.com/fluxcd-testing/cosign-testing/\.github/workflows/release\.yml@refs/tags/.*" ghcr.io/fluxcd-testing/cosign-testing/v3/charts/podinfo:6.9.4

so there's some option or behavior I'm missing when using VerifyImageAttestations() vs VerifyImageSignatures()

Comment on lines 125 to 173
// Initialize TrustedMaterial for v3/Bundle verification
if checkOpts.TrustedMaterial, err = cosign.TrustedRoot(); err != nil {
return nil, fmt.Errorf("unable to initialize trusted root: %w", err)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this gets v3 keyless verification work on my local units.
I expect the e2e to make some progress now.

This defaults to refreshing the TUF root automatically every 24 hours, and it's override-able with an env var.

Copy link
Member Author

@stealthybox stealthybox Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing unit test with v2 and v3 keyless:

func TestPKBundleAttestationsKeyless(t *testing.T) {
	tests := []struct {
		name       string
		tagURL     string
		identities []cosign.Identity
	}{
		{
			name:   "v2 podinfo chart",
			tagURL: "ghcr.io/stefanprodan/charts/podinfo:6.9.4",
		},
		{
			name:   "v3 fluxcd-testing chart",
			tagURL: "ghcr.io/fluxcd-testing/cosign-testing/v3/charts/podinfo:6.9.4",
			identities: []cosign.Identity{
				{
					IssuerRegExp:  `^https://token\.actions\.githubusercontent\.com$`,
					SubjectRegExp: `^https://github\.com/fluxcd-testing/cosign-testing/\.github/workflows/release\.yml@refs/tags/.*`,
				},
			},
		},
	}

	for _, tt := range tests {
		t.Run(tt.name, func(t *testing.T) {
			g := NewWithT(t)

			ctx := context.Background()

			// tagURL := fmt.Sprintf(tag, registryAddr)
			ref, err := name.ParseReference(tt.tagURL)
			g.Expect(err).NotTo(HaveOccurred())

			transport := http.DefaultTransport.(*http.Transport).Clone()
			// transport.Proxy = http.ProxyURL(tt.proxyURL)

			var opts []Options
			opts = append(opts, WithRemoteOptions(remote.WithTransport(transport)))
			opts = append(opts, WithIdentities(tt.identities))

			verifier, err := NewCosignVerifier(ctx, opts...)
			g.Expect(err).NotTo(HaveOccurred())

			_, err = verifier.Verify(ctx, ref)
			g.Expect(err).NotTo(HaveOccurred())
		})
	}
}

@stealthybox
Copy link
Member Author

stealthybox commented Jan 26, 2026

It's notable that identities are required for verification with v3.
They were optional with v2.

We should probably add some sad-path tests here, so that we understand what the UX is for a Flux user who may not know what the exact string their trying to match for

    matchOIDCIdentity:
    - issuer: ^https://token\.actions\.githubusercontent\.com$
      subject: ^https://github\.com/fluxcd-testing/cosign-testing/\.github/workflows/release\.yml@refs/tags/.*

In order to write these Regex's, I needed to grep the debug logs from the cosign CLI.

Ideally an error message from the verifier would surface in the reconciler and tell you what the Issuer and SAN's are?

I think the cosign CLI should also help the user so much more here.
The bundle format output is much more minimal than the v2 signature output.

@stealthybox stealthybox force-pushed the cosign-v3-bundles branch 2 times, most recently from a1c0226 to fdfaeca Compare January 26, 2026 08:53
@stealthybox
Copy link
Member Author

well, e2e passes, and GO_TEST_PREFIX=TestOCIRepository_reconcileSource_verifyOCISourceSignature_keyless make test-ctrl passes locally.

just checked, and the runner is using an up-to-date ca-certificates as well.
I'm not sure why refreshing the TUF bundle is failing in the test workflow.

Gonna take a break.

v2 signatures and v3 bundled signatures both function transparently.
This does require additional queries to the registry.

Signed-off-by: leigh capili <leigh@null.net>
Signed-off-by: leigh capili <leigh@null.net>
Signed-off-by: leigh capili <leigh@null.net>
Natation signing was previously relying on the
TLSConfig being added to the http.DefaultTransport
as a side-effect

Messing with the http.DefaultTransport was causing
TLS verification failures for internet requests to
the TUF repo.

Signed-off-by: leigh capili <leigh@null.net>
@stealthybox stealthybox force-pushed the cosign-v3-bundles branch 2 times, most recently from 0006cd1 to 36af7f5 Compare January 28, 2026 10:44
1. use global DNS proxy instead of a mock resolver
2. fix data race and broken DefaultResolver
3. add previously missing test registry shutdowns

Signed-off-by: leigh capili <leigh@null.net>
Signed-off-by: leigh capili <leigh@null.net>
@stealthybox
Copy link
Member Author

stealthybox commented Jan 28, 2026

wow, that was gnarly...
broken default resolver and http tls configs, unclosed registry servers causing port conflicts, a data race between tests, and some sync to button it all up.
I made small notes in the commit messages.

this is ready for a review

the TUF fetcher updates the "live trusted root" in the background every 24hrs by default -- forgot to leave a comment in the code about that

We can also add more test cases

  • failing v3 bundle verification error messages
  • v2 + v3 sigs on the same artifact (can even mismatch keys)
  • maybe some failure injection to test the lifecycle of the trustedroot factory as well

none of these necessarily have to block the merge though
this is definitely working, and this unblocks our dependency upgrades

@stefanprodan
Copy link
Member

the TUF fetcher updates the "live trusted root" in the background every 24hrs by default

This should not happen unless there are sources that make use of cosign.

Copy link
Member

@matheuscscp matheuscscp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for working on this! 🙏

// TODO(stealthybox): it would be nice to control the http client here for the TrustedRoot fetcher
// with the current state of this part of the cosign SDK, that would involve duplicating a lot of
// their ENV, options, and defaulting code.
f.trustedMaterial, f.initErr = cosign.TrustedRoot()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any harm in calling cosign.TrustedRoot() in parallel from a few goroutines when it's time to refresh, to avoid contention? We do a very similar thing like this here:

https://github.com/controlplaneio-fluxcd/flux-operator/blob/c01c37991fab8d8aed7a1e6090937825859c5704/internal/web/auth/oidc.go#L47-L76

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stefanprodan Is this what you meantioned in the dev meeting about the trust root fetching not being thread-safe? If yes, I think we can resolve this comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@stealthybox stealthybox Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I understanding your suggestion is to allow the other goroutines to create TrustedRoots before the lock and then race to lock and set f.trustedMaterial?
( benefit being if one goroutine is slow for some reason, another could proceed more quickly... I suppose it could happen, but it feels unlikely for this operation. )

The live-trust-root does already use a mutex internally that is precisely timed to the background daily fetch, so contention would only occur on the construction of it.

One major snag of potentially making multiple instances of live-trust-roots is we cannot really manage their lifecycle.
sigstore-go controls the http client and doesn't actually plumb through any Context, so if we create them, we can't cleanly kill the metadataFetcher/Updater tickers, and their goroutines will just continue forever. The ticker is not passed to anything even internally in the library:
https://github.com/sigstore/sigstore-go/blob/18f941817a298f6443f70e16c82f306a52928977/pkg/root/trusted_root.go#L481-L526

@stefanprodan made a good point when we synced earlier, that this implementation lazily creates this autoUpdating root when someone starts using cosign keyless verification, but nothing is going to stop the auto-updater until a Pod restart.

At some point, we will probably need a map of these, keyed to their NamespacedSecret for supporting custom Rekor and Fulcio roots.
Because of this, I anticipate needing to work with the cosign and sigstore-go maintainers upstream to make this SDK surface more extensible.

For instance, here's the additional env and defaulting logic in cosign -- they're all private functions, and opts is not exposed to the top-level call to TrustedRoot():
https://github.com/sigstore/cosign/blob/71291bec3bc6f822b2fd57317cde937214cf1539/pkg/cosign/tuf.go#L29

At least for now, global shared instance defaults can overridden by patching the ENV and directories of source-controller.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the trust root fetching not being thread-safe? If yes, I think we can resolve this comment

See theupdateframework/go-tuf@d2dbc18

regarding thread-safety, the TrustedRoot struct doesn't have these same warnings, but the getter functions do just return maps for the Rekor and CT Logs which could case some strange or eventually consistent behavior?
https://github.com/sigstore/sigstore-go/blob/18f941817a298f6443f70e16c82f306a52928977/pkg/root/trusted_root.go#L39-L78

This could be worth reviewing ^

LiveTrustedRoot wraps and swaps the TrustedRoot with a mutex.
This seems fine to me.

@stealthybox
Copy link
Member Author

the TUF fetcher updates the "live trusted root" in the background every 24hrs by default

This should not happen unless there are sources that make use of cosign.

Confirming that the Factory for the TUF updater is implemented to not initialize until a cosign keyless verifier is specifically created in a reconcile loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unable to verify signature from cosign v3.x

4 participants