Personal Dockerfile for data science

Personal Dockerfile template for my research workflow
software
Author

Hyoungchul Kim

Published

March 15, 2025

During spring break, I had some time to finalize the Dockerfile template for my research workflow. If you don’t know what a Docker is, check it out here. TL;DR, it is a light virtual machine that contains all the necessary resources for your empirical analysis. It is as if you are shipping your computer to other people that wants to try out your analysis.

If you want the exact replication project folder, click here.

This is the final Dockerfile (for now):

# Use Rocker image as the base for R
FROM rocker/r-ver:4.4.0

LABEL maintainer="Hyoungchul Kim <hchul.kim96@gmail.com>"

## Update and install system dependencies
RUN apt-get update && apt-get install -y \
    libcurl4-openssl-dev \
    libssl-dev \
    libfontconfig1-dev \
    libharfbuzz-dev \
    libfribidi-dev \
    libfreetype6-dev \
    libpng-dev \
    libtiff5-dev \
    libjpeg-dev \
    libglpk-dev \
    libxml2-dev \
    libcairo2-dev \
    libgit2-dev \
    libpq-dev \
    libsasl2-dev \
    libsqlite3-dev \
    libssh2-1-dev \
    libxt-dev \
    libgdal-dev \
    wget \
    curl \
    vim \
    git 

## Install Pandoc and Quarto (Required for RMarkdown, Quarto, etc.)
# RUN /rocker_scripts/install_pandoc.sh
# RUN /rocker_scripts/install_quarto.sh

## Install Python & Poetry
RUN /rocker_scripts/install_python.sh && \
    pip3 install --upgrade pip && \
    pip3 install poetry

# Ensure Poetry installs dependencies in the system environment
RUN poetry config virtualenvs.create false

# Copy Poetry files and install dependencies
COPY pyproject.toml poetry.lock .
RUN poetry install --no-interaction --no-root

## Install Julia 1.11.3 (to match Manifest.toml)
ENV JULIA_VERSION=1.11.3
RUN /rocker_scripts/install_julia.sh

## Set working directory
WORKDIR /project

## Copy renv.lock file into the folder
COPY renv.lock .

# Set environment variables for renv
ENV RENV_VERSION=1.0.7
ENV RENV_PATHS_CACHE=/renv/cache
ENV RENV_CONFIG_REPOS_OVERRIDE=https://cloud.r-project.org
ENV RENV_CONFIG_AUTOLOADER_ENABLED=FALSE
ENV RENV_WATCHDOG_ENABLED=FALSE
RUN echo "options(renv.consent = TRUE)" >> .Rprofile
RUN echo "options(RETICULATE_MINICONDA_ENABLED = FALSE)" >> .Rprofile

# Install renv from CRAN (avoiding bootstrapping by specifying version)
RUN R -e "install.packages('renv', repos = c(CRAN = 'https://cloud.r-project.org'))"
RUN R -e "renv::consent(provided = TRUE)"

# Run renv restore to restore the environment
RUN R -e "renv::restore(confirm = FALSE)"

# Install Julia packages and manage dependencies
COPY Manifest.toml Project.toml .
ENV JULIA_PROJECT=/project
RUN julia -e "import Pkg; Pkg.activate(\".\"); Pkg.instantiate()"

# Copy over the rest of the project files
COPY . .

# Default command
CMD ["bash"]

Although most of the commands are self-explanatory from the comments, here is some additional info on what my Dockerfile does:

  1. In my personal Dockerfile, I have added three major programming languages that I use: R, Python, and Julia.

  2. I have also added some necessary dependencies and useful programs (e.g. git, vim) for my analysis.

  3. I have also added dependency management software for all of the languages. renv for R, poetry for Python, and Pkg environment for Julia. This allows you to use the exact versions of the packages that you installed in your analysis.

  4. For this code to fully work, you would need to setup some files (e.g. Project.toml, Manifest.toml, etc).

  5. Save the above code as Dockerfile (no extensions) right inside your project directory.

Quick terminology

  • Dockerfile: “The sheet music.” The list of layers and instructions for building a Docker image.

  • Image: “The MP3 file.” This is the tarball.

  • Container: “Song playing on my phone.” A container is a running instance of an image.

Docker workflow

  1. Create Dockerfile.

  2. Build Docker image using Dockerfile.

# IMPORTANT: If this does not go through, maybe try sudo command
docker build --network=host --tag <PROJECT_NAME>:VERSION . 

# Above command might not work well in computers such as Macbook with Apple Silicon (Ofc...). This is mainly because it has a different cpu architecture (arm64 instead of amd64) so the base image is different. I am not sure if this is the best solution but try this code below.
# docker build --platform=linux/amd64 --tag <PROJECT_NAME>:VERSION .
  1. Run Docker image.
docker run -it --rm <PROJECT_NAME>:VERSION