Building a Data Pipeline

Session Description

In this session, we’ll explore some of the basic workflow which we’ll use over the course of the semester to package and share analysis. We’ll develop familiarity with Quarto, and basic operations in Github so that you are able to share code and analysis over the course of the semester.

Lab 1 Link

Before Class

Ensure that your computer has the latest stable versions of R and RStudio installed.

Accept the GitHub invitation to our Lab 1 repository and download the repository to your local computer (we will set up more advanced tools for interacting with GitHub in our next lab session.

D’Ignazio, Catherine, and Lauren F. Klein. (2020). Data Feminism. MIT Press. Chapter 1 , Chapter 2

Reflect

Workflows

What are the types of common tasks in your workflows that you think would benefit from a data pipeline?
How do we hold ourselves accountable for our analysis?

Readings

Whose interests and goals do you seek to represent through your work?
How does Collins’ matrix of domination (structural, disciplinary, hegemonic, interpersonal) interact with acts of data-driven storytelling?
What missing datasets (akin to the Library of Missing Datasets) have you observed?¹
What’s an analysis for which you’d like to reconstruct in ways that challenge the power manifested?

Slides

Resources for Further Exploration

Footnotes

At the beginning of our session, we’ll catalog some of these datasets - it may help to write down some of your thoughts to share.↩︎