One-pager on MIT’s ./missing-semester

A self-contained summary of the “Missing Semester of Your CS Education” course.

I summarized the 2020 ./missing-semester course from MIT into one handy reference document.

The content is self-contained so you can either use it as a guide for the course, or refer to it on its own.

Feel free to drop a comment (or pull request!) for any questions or feedback.

Enjoy!

1 Shell basics

1.2 Use single quotations to nullify meaning

Enclosing characters in single quotes preserves the literal value of each character within the quotes. A single quote may not occur between single quotes, even when preceded by a backslash.1

1.3 sudo

sudo stands for “super user do”2.

2 Shell tools and scripting

2.1 Save values by setting variables

Set variable using x=[value] with no spaces around the equals sign.

Access it with echo $x.

2.2 How to write bash scripts

We can write a bash script by copying the following into mcd.sh:

mcd () {
  mkir -p "$1"
  cd "$1"
}

$1 refers to the first positional argument the function receives. Putting double quotations around it ensures that names that contain a space don’t accidentally get truncated.

If you load this function into the terminal with source mcd.sh you can run it using mcd [dirname].

Some more variable names:

Command Description
$1 to $9 First 9 positional arguments (use ${i} to access argumennts past the 9th).
$0 Name of the filename that the script is being run from (will be /bin/bash for the example above.)
$_ Last argument of the last command. So we could have changed line 3 above to cd "$_".
$# Number of arguments passed in.
!! Reruns the previous command. Useful for redoing things that failed due to permissions: sudo !!.
$? Exit code of the previous command. You can use it to catch errors with || and &&3.

2.3 Command substitution saves output to a variable

For instance, x=(pwd) will save your working directory into x.

Process substitution lets you pass command output into commands that expect a file instead of STDIN: cat <(ls) <(ls ..) will concatenate the contents of the current directory with that of its parent directory.

2.4 Grab files names using patterns with globbing wildcards

Wildcard Description
? Matches one position: x? matches x1 but not x10.
* Matches multiple positions: x* matches x1 and x10.
{} Perform expansion of all values. Use multiple for Cartesian product; e.g. echo {1,2}{3,4} returns 13 14 23 24. Use {x..y..i} to go from x to y with increment i.

2.5 Useful misc. shell commands

Command Description Example
spellcheck Analyze shell scripts for errors and bugs. spellcheck script.sh
tldr Get concise help pages (with examples!). tldr [command]
find Find files recursively in a folder. find [dir] -name '[filename|*.ext]'

find can also process the results:

# Delete all files with .tmp extension
find . -name '*.tmp' -exec rm {} \;

2.6 Alternatives to find with extra functionality

Command Benefit
fd Modernized implementation with simpler syntax.
locate Faster lookup via file indexing (updated by updatedb).
grep Search the content of files.
rg (ripgrep) Faster, modernized grep.
rga Use rg on PDFs, and other documents.

2.7 Use history to avoid re-typing commands

Command Description
history Returns all commands that you ran.
history [n] Returns the last n commands.
history | rg [search term] Searches through commands.
<ctrl>-r Built-in search.<ctrl>-r to cycle through entries and <ctrl>-c to cancel the search.

You can improve <ctrl>-r by incorporating fzf to add fuzzy matching. Learn how with this command:

cat $(rg README <(apt show fzf) | cut -d ' ' -f 3)

Hopefully it makes sense to you based on what you’ve read so far!

2.8 Directory navigation commands

Command Description
tree Basic tree visualizer.
broot Nicer tree UI.
nnn Mac-style explorer.
less Break long text into pages.

3 Editors (Vim)

3.1 Vim operates in different modes

Mode How to enter
NORMAL <esc>
INSERT i
REPLACE R
VISUAL v
V-LINE <shift>-v
V-BLOCK <ctrl>-v
COMMAND-LINE :

3.2 Movement around using keyboard shortcuts

Command Description
j/k Down/up.
h/l Left/right.
w Next word.
b/e Beginning/end of word.
0/$/^ Move to beginning/end/first non-empty character of line.
gg/G Beginning/end of file.
H/M/L Top/middle/bottom of screen.
<ctrl>-u/d Page up/down.
[number]G Go to line.
% Go to corresponding parenthesis, bracket, etc.
f/F Find a character forward/backward. Navigate back and forth with ,/;.
t/T Like find but up until the character instead of on it.

You can do a lot of the movement commands n times, e.g. [n]w to go forward n words.

3.3 Edit text efficiently with commands

Command Description
a Append to current position.
A Append to end of current line.
r Replace one character.
R REPLACE mode.
d Delete.
c Delete movement and switch to insert mode.
x/X Delete a character forwards/backwards.
s Delete a character and switch to insert mode.
S Delete a line and switch to insert mode.

Editing commands can modified via movement commands, e.g. dw will delete a word, and d[n]w will delete n words.

Quickly duplicate a line with yy p.

3.4 Modifiers select text within blocks

Modifier Description
a Around.
i Inner.

Modifiers change the meaning of a command.

So ci( will delete the contents inside a pair of parentheses, while ca( will include the parentheses.

Modifiers can also be used in visual mode: vas will select the current sentence, vap the paragraph, and va( the parenthetical block.

3.5 Find

Find things with / followed by what you are looking for. Pressing <enter> will jump you to the closest occurrence. n/N cycle forwards/backwards through the results.

4 Data wrangling

4.1 Stream editing processes sequences of elements

There are a number of commands you can use to process streams:

Command Description Syntax
sed Process text in place using regex. Use s for text substitution and -E for modern syntax**. sed -E s/[pattern]/[replacment]/[flag]
awk Process columnar data. Access the columns using $n. Using BEGIN and END enable stateful behavior. awk 'command'
xargs Takes lines of inputs and turns them into command arguments. [STDOUT] | xargs [command]

Also worth mentioning:

  • sort
  • uniq (requires a sorted list; -c to include counts)
  • paste (-s to concatenate columns and -d to change the delimiter)
  • head/tail
  • bc (calculator! Use with -l to read STDIN).

4.2 Regex is a language to capture text patterns

Here are some common building blocks:

Expression Description
. Any character.
[...] A set of characters.
[^...] The opposite of the set.
(x|y|...) A set of strings.
(...)* 0 or more.
(...)+ 1 or more.
(...)? 0 or 1.
(...){n} N times.
(...){n,} N or more times.
(...){n, m} N to m times.
^ Anchor for beginning of string.
$ Anchor for end of string.

Adding ? suffix to quantifiers (*, +, ?, {n}, {n,}, {n, m}) will toggle lazy matching which will find the leftmost match. Quantifiers are greedy by default, meaning they will match the rightmost character, which can lead to unintended matches.

Create a capture group around an expression using parentheses (). Then you can select it positionally in a replace statement using \n to indicate the nth group:

# Return just the filenames from a list of PDFs.
ls *.pdf | sed -E 's/^(.+).pdf$/\1/'

5 Command-line environment

5.1 Manage multiple processes using job control

You can control processes via standardized POSIX signals4 that interrupt, pause, background start, etc. programs:

Shortcut Description
<ctrl>-c SIGINT, interrupts the current process.
<ctrl>-\ SIGQUIT, quits the process similar to SIGINT.
<ctrl>-z SIGSTOP, suspends the current process.

You can catch these signals to add functionality like saving program state when interrupted for instance.

There are also commands to manage unfinished processes:

Command Description
jobs View unfinished jobs. -l to list PID (Process ID), -p only list PID, etc. (cf. tldr jobs).
ps Get a snapshot of processes. Similar to jobs.
bg %n/PID Continue unfinished job in the background using %n position in jobs queue or PID.
fg %n\PID Bring and resume a background process to the foreground.
kill %n/PID Terminate job.
nohup Continue job even if terminal session is closed by by ignoring SIGHUP hangup signal.
disown %n/PID nohup already running jobs from current session.
[process] & Start command in background (will print to STDOUT unless redirected).

<ctrl>-z followed by bg will send the current process to execute in the background.

5.2 Multiplexers enable multitasking within a single session

tmux, a popular multiplexer provides the following hierarchy:

  • Sessions are an independent workspace.
  • Windows are like tabs within a workspace.
  • Panes are individual splits within a window.

Once a session begins (tmux new -s NAME) commands are bound to <ctrl>-b. This is commonly remapped to <ctrl>-a.

Then you can create new windows and rearrange panes. Use <ctrl>-b + ? for a list of options.

Also feel free to consult this tutorial and its followup on customization.

6 Version control (Git)

6.1 Git tracks data efficiently using hashing

Commits in Git come with a long alphanumeric string. This string is called a hash and it links each commit to the data it represents.

A hash is a unique string generated by the content of a file. If the file changes, so does the hash. Git uses this fact to store files efficiently: modified files get a new hash, while unedited files keep the same value.

Commits refer to the hash value of the root directory. The root directory itself contains the hashes of its contents, and so on. By storing hashes instead of data, Git avoids having to copy the same files over and over again for each commit. This hash-based data structure is called content-addressed storage (CAS).

Use git cat-file -p [commit hash] to explore your commits!

6.2 Useful Git commands

Command Description
git clone --depth=1 Clone a git repository without the commit history.
git add -p [file] Select lines of code within a file to add (patch).
git blame [file] See who and which commit authored each line of a file.
git show [commit] Display what changed in each commit.

Guide: how to write good commits.

7 Debugging

7.1 Loggers collect print statements for later analysis

“The most effective debugging tool is still careful thought, coupled with judiciously placed print statements” — Brian Kernighan, Unix for Beginners.

A logger will automatically save print statements to a file. It can capture anything from summary statistics about incoming data to error messages. You can set severity levels5 to particular log entries to enable filtering through data as needed.

Logs let you inspect runs to track and resolve issues over time. This strategy pays off as project complexity increases.

/var/logs/ contains system logs.

7.2 Debuggers help you fix code interactively

Debuggers are programs that let you step through code interactively so you can figure out what it is doing.

ipdb6 is Python’s built-in pdb with niceties such as colored output, tab completion and so on. You start it by running python -m ipdb [script].

These are some useful commands for pdb/ipdb:

Command Description
l Lists +/- 5 lines around a line. Use l . for the current line, or l [start] [end] for an arbitrary range.
ll Long lists all the code around the current line.
s Run the current line and step to the next available line. This is could be inside of another function, so it’s not necessarily the next numerical line!
n Execute the current line and stop at the next numerical line, jumping over any function calls.
restart Re-run the program from the top.
c Continue executing from the current line until the program halts, either due to hitting a breakpoint, completing, or crashing.
p [var] Print the value of any variable in memory.
b [line number] Add a breakpoint to the specified line. The debugger will halt execution when it hits this line.
q Quit.

General debuggers like gdb work on any language and provide low-level information on hardware registers etc..

7.3 Static analyzers fix programming errors and formatting issues

Linters check code for logical issues such as referencing an undefined variable. They will typically not edit code but instead highlight the issue for the programmer to fix. Examples include pyflakes, and flake8.

Formatters standardize code and ensure compliance with style guides such as PEP. They usually edit the file for you. Examples include autopep8, black, and isort.

ruff7 is a new linter and formatter offering 10-100x faster performance.

8 Building and versioning

8.1 Build systems produce an output by managing dependencies

Builders manage the entire process to create a target file by recursively generating every dependency that is required. It’s a powerful idea that allows you to execute an entire workflow in a single command8.

The most popular builder is make. It looks for a file called Makefile which contains instructions on how to build each file in the process:

[target]: [dependencies]
  [bash commands to produce the target based on dependencies]

[dependency]: [subdependencies]
  [bash command to produce dependency based on subdependencies]

...

There are great tutorials on how to write a good makefile.

It’s a good idea to have make manage the high-level workflow and delegate the creation of particular files to domain-specific builders. This prevents the makefile from becoming too complicated.

8.2 Semantic versioning enables dependency management

Semantic versioning is a popular form of program versioning that allows you to tell at a glance whether a version is compatible for your software:

version = major.minor.patch
  • patch updates do not change the API.
  • minor updates add to the API and are backwards compatible.
  • major updates change the API in a non-backwards-compatible way.

This framework allows dependency managers ensure program stability by accepting minor updates, for instance, and rejecting major updates until the code is refactored.

Python uses semantic versioning. It is why Python 2 scripts are not compatible with Python 3, and vice-versa.

Footnotes

  1. https://www.gnu.org/software/bash/manual/html_node/Single-Quotes.html↩︎

  2. Obligatory: https://xkcd.com/149/.↩︎

  3. https://www.gnu.org/software/bash/manual/bash.html#Lists↩︎

  4. Look up man signal for a comprehensive list of every signal.↩︎

  5. cf. Python’s severity levels.↩︎

  6. https://github.com/gotcha/ipdb↩︎

  7. https://github.com/astral-sh/ruff↩︎

  8. For instance, you could build a pipeline to run an experiment and automatically report the results in a LaTeX-rendered report, or to build, test, and publish a programming package, or to render and ship a website.. the possibilities are endless!↩︎