ENV RENV_CONFIG_REPOS_OVERRIDE=https://packagemanager.posit.co/cran/__linux__/noble/latest
I had a chance to create a replication package for my coding assignment this fall. While doing the assignment, I decided to use this opportunity to actually implement a Dockerfile that will work for most of the platforms (e.g. macOC, Linux, ARM/AMD architecture). This is a post on my trial and error on building Docker image with Dockerfile.
(R
specific) Incompatible binary issue with certain packages
If you have used linux, you would be very familiar with some of the installtion issues with R
packages. It just take awful long time to install most of the packages in linux. The reasn is simple: unlike Windows or MacOS, lot of the packages do not have pre-compiled binary for linux. This is sort of due to the fact that there are so many different linux distributions. As Windows or MacOS is a popular OS (Ugh…), it is easier to maintain standard binary for them. However, this is not the case of linux. Due to this reason, linux users need to compile the source code of the package and literally create the binary for the package. Since most of the powerful packages require compilation (they use C/C++/Fortran for creating efficient package), it is not a surprise that it takes a long time to install the package. Even worse, sometimes compilation does not work because you do not have the necessary compilers or necessary system libraries. This means you need to pre-install all the necessary dependencies before compiling the package.
Fortunately, this issue was recently solved through the help of Rstudio (Posit) Public Package Manager
. This is a service that provides fall installtion of binary R
packages for linux. Nowadays, most of the packages have their binary counterpart for linux. Thus, you can significantly reduce the time to install the package.
The problem however, is that the package manager is not perfect. I am not exactly sure why, but it seems binary might not work for some newer version of linux distribution if they were built from some different version of system libraries. This seems to be case for packages like sf
and stringi
.
So how can we solve this? Well if you are using renv
R package as your package dependency manager, you can simply use the following command in your Dockerfile
to override the repository written in the renv.lock
file:
This command will override the repository and install the packages from repository based on certain linux distribution you are using. In this case, I set it to noble
which is the Ubuntu 24.04 LTS. This goes nicely with R Rocker project
and GitHub Actions because their newest images are based on Ubuntu 24.04 LTS.
But there is a caveat: current version (1.1.5
) of renv
package does not support this feature. This is a known issue (#2127). Fortunately, this was resolved but the newer version is not yet released. For now, you need to use the development version of renv
package.1 You can install it by running the following command:
RUN R -e "install.packages('renv', repos = 'https://posit.r-universe.dev')"
Multi-platform issue
Going deep into reproducibility is that you need to realize that there are two main CPU architecture: amd64 (x86_64)
and aarch64 (arm64)
. The problem is that base Docker images built on certain CPU platform will not work for the other platform. Also, there are cases where different architecture will require different binary to install certain software. That is, some binary built for amd64
will not work for aarch64
and vice versa.
Why should we care about this? Well, the reason is simple: both of the architecture are widely used in the world. amd64
is very common architecture for many types of computers. aarch64
is also a very common architecture. In fact, you are using aarch64
architecture if you are using an Apple Silicon Mac. So the big problem is that if you are using a base Docker image built on amd64
, it may not work for people using MacOS.
Fortunately, solution is simple: build both amd64
and aarch64
images. You can do this locally using docker buildx
command or you can use this GitHub Actions yaml file:
name: build_docker
on:
push:
branches: [ master, main ]
jobs:
docker:
runs-on: ubuntu-latest
env:
IMAGE_NAME: r_4.5.1 # repo name on Docker Hub: DOCKERHUB_USERNAME/r_4.5.1
steps:
- name: Checkout
uses: actions/checkout@v4
# Enable QEMU so we can cross-build arm64 on amd64 runners
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
# Set up Buildx builder
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build and push (multi-arch) base Dockerfile
uses: docker/build-push-action@v6
with:
context: .
file: ./Dockerfile
platforms: linux/amd64,linux/arm64 # <-- key line
push: true
# tag strategy: latest + branch + short sha
tags: |
${{ secrets.DOCKERHUB_USERNAME }}/${{ env.IMAGE_NAME }}:latest
${{ secrets.DOCKERHUB_USERNAME }}/${{ env.IMAGE_NAME }}:${{ github.ref_name }}
${{ secrets.DOCKERHUB_USERNAME }}/${{ env.IMAGE_NAME }}:${{ github.ref_name }}-${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
pull: true
Another multi-platform issue: tinytex
I was also trying to install Quarto
because I used it to render my assignment pdf. However, I encountered a problem when I was trying to install tinytex
package. Quarto
needs tex compiler to render the pdf. One that is used a lot is tinytex
because it is a portable and light tex distribution. When I was installing tinytex
in amd64
image, it was fine because all I need to do is:
RUN quarto install tinytex
which is provided by Quarto
command. The issue, however, was when I was trying to same thing for aarch64
image. Apparently, it seems the previous command runs a binary that is based on amd64
architecture. Thus this command cannot install tinytex
for aarch64
image. In order to solve this, I created this if
statement in the Dockerfile
to install tinytex
for aarch64
image.
# Quarto
ENV QUARTO_VERSION=1.7.32
RUN /rocker_scripts/install_quarto.sh
# --- TinyTeX install (arm64 manual, amd64 via Quarto) ---
RUN set -eux; \
if [ "$ARCH_TYPE" = "arm64" ]; then \
wget -qO- "https://yihui.org/tinytex/install-unx.sh" | sh -s - --admin --no-path; \
else \
quarto install tinytex; \
fi
# Set TinyTeX path
ENV PATH="/root/.TinyTeX/bin/aarch64-linux:/root/.TinyTeX/bin/x86_64-linux:${PATH}"
# Set CTAN mirror for tlmgr
ENV TEXLIVE_REPOSITORY="https://ctan.math.illinois.edu/systems/texlive/tlnet"
# Set repo in tlmgr
RUN tlmgr option repository "$TEXLIVE_REPOSITORY"; \
tlmgr update --self; \
tlmgr update --all
# -------------------------------------------------------
Note that if you are only making amd64
image, you don’t need to set the tinytex path. I am doing this because in aarch64
image, tinytex
is installed manually and path is not set by Quarto
command.
tinytex
CTAN mirror error
This was the most annoying issue I encountered. When tinytex
encounter tex
package that is not installed in the local system, it uses CTAN
mirror in install the necessary tex
packages. The problem is that the mirror they refer to is very random… So if you don’t set the mirror, it might occasionally connect to stale one and not be able to install the necessary tex
packages. To solve this, you just need to manually set the mirror that seems to be “working.” This is done in the previous code block.
If you want to look at the full Dockerfile
, you can find it here.
Footnotes
Let’s hope the newer version gets released soon…↩︎