AWS CodeArtifact

AWS CodeArtifact

CodeArtifact has been in AWS' devops tool set since mid 2020, but I haven't explored it until about 6 months ago. Being the lazy person I am, I get to explore things only when I have to. And the case wasn't any different here either.

A few days ago I did a brief presentation on AWS CodeArtifact in AWSMeetup Auckland. This is my attempt to share the discussion here.

What is AWS CodeArtifact?

It is a managed package management tool (like Jfrog's Artifactory, Sonatype's Nexus etc.) 'Managed' is the key - you don't need to maintain the tool by yourself and you don't have to worry about scaling as your packages grow in number and size. Obviously it comes with a cost, but it is "Pay as you go" much like many other AWS services and with no upfront cost or whatsoever. And if you think it is not meeting your expectations, delete it and be happy again!

The buzz words

The two key buzz words with AWS CodeArtifact are Domain and Repository.

A domain is how you organize your repositories. In a layman's terms let us say it is a 'folder' or 'directory' of repositories. Each domain can have one or more repositories. Access to a domain is controlled by a 'Domain Policy' (much like a S3 bucket policy or KMS policy).

A repository is where you store your packages/artifacts. Each repository is part of a domain and has its own individual repository policy to control access.

And then there is a third one, Upstream. Upstream represents the public upstream repository (or repositories) from where your repositories may be downloading dependencies for you, like PyPi, Maven Central etc.

CodeArtifactHL.drawio.png

Sounds good, should I use it?

Well, that depends. If you are starting small, running workload in AWS and your binaries (and future packages) belong to supported types (python, node, dot Net and Java) AWS CodeArtifact is a good natural choice.

If you want to make sure that your package upload doesn't leave your VPC, CodeArtifact is your friend, thanks to VPC endpoints.

Access policies at the repository and domain level helps in defining fine grained access control. For example, you can have a "Web-components" domain with dev-web, test-web and prod-web repositories in it. The domain policy can be setup to allow access by all developers, but the repository policy restrict developers from uploading packages to the test-web and prod-web repositories while allowing full access to the dev-web repository. Your automated build/publish process can be configured to allow "upload" (publish) packages to the relevant repositories.

Another nice thing about AWS CodeArtifact repositories is that they can be "untyped". That is, it can be configured to have multiple upstream and hold multiple package types in the same repo (python, npm, java, .Net). If that is a legitimate use case you need to support, AWS CodeArtifact can be an option.

Support Encryption at rest, by using KMS.

Now, the not so good bits

AWS CodeArtifact is somewhat similar to AWS ECR. One need to login to the repository and that login command is somewhat wordy and a bit too long.

aws codeartifact login --tool pip --repository dev-web --domain engineering-dev --domain-owner 999933311111 --region ap-southeast-2

Or if VPC endpoint is used

aws codeartifact login --tool pip --repository dev-web --domain engineering-dev --domain-owner --endpoint-url https://vpce-asdf879p3rasdmbmb.api.codeartifact.ap-southeast-2.vpce.amazonaws.com --region ap-southeast-2

CodeArtifact doesn't come with a package scanning (vulnerability scanning) for the uploaded or downloaded packages. At least a package blacklisting would have been very beneficial, but no such features to avail (at least until now).

Supported package types is another limitation. The other package managers (like Artifactory) support most of the package types while CodeArtifact supports only a small subset of them. Lack of support for RPMs, RubyGems and Zip/tar is indeed a drawback.

So is it useful then?

I would say it is a useful tool, in certain scenarios. If you want to build and publish from within your network/vpc or if you want your applications/services running in subnets with no Internet access to receive packages, AWS CodeArtifact presents an easy solution. At a very high level it could look something like the diagram below.

CodeArtifact-vpc.drawio.png

Finally, at what cost?

AWS CodeArtifact is a managed service and that comes with a cost. The cost has multiple components

  • Storage per month (The sum of sizes of the packages you are storing. This includes the public packages that was pulled and from upstream).
  • CodeArtifact requests per month
  • Data transfer (Data transferred in from Internet and data transferred between other AWS services within the same region are free) There is a free tier eligibility for CodeArtifact that includes 2GB storage and 100,000 requests per month. Refer CodeArtifact pricing for more details.