I summarized the 2020 ./missing-semester
course from MIT into one handy reference document.
The content is self-contained so you can either use it as a guide for the course, or refer to it on its own.
Feel free to drop a comment (or pull request!) for any questions or feedback.
Enjoy!
1 Shell basics
1.1 Use redirection and piping to link commands together
>
and<
redirect output from and to a file or stream.x < y
redirectsy
as input tox
.x > y
redirects output ofx
toy
.<<
and>>
will append as opposed to overwrite.|
chains program together.
1.2 Use single quotations to nullify meaning
Enclosing characters in single quotes preserves the literal value of each character within the quotes. A single quote may not occur between single quotes, even when preceded by a backslash.1
1.3 sudo
sudo
stands for “super user do”2.
2 Shell tools and scripting
2.1 Save values by setting variables
Set variable using x=[value]
with no spaces around the equals sign.
Access it with echo $x
.
2.2 How to write bash scripts
We can write a bash script by copying the following into mcd.sh
:
$1
refers to the first positional argument the function receives. Putting double quotations around it ensures that names that contain a space don’t accidentally get truncated.
If you load this function into the terminal with source mcd.sh
you can run it using mcd [dirname]
.
Some more variable names:
Command | Description |
---|---|
$1 to $9 |
First 9 positional arguments (use ${i} to access argumennts past the 9th). |
$0 |
Name of the filename that the script is being run from (will be /bin/bash for the example above.) |
$_ |
Last argument of the last command. So we could have changed line 3 above to cd "$_" . |
$# |
Number of arguments passed in. |
!! |
Reruns the previous command. Useful for redoing things that failed due to permissions: sudo !! . |
$? |
Exit code of the previous command. You can use it to catch errors with || and && 3. |
2.3 Command substitution saves output to a variable
For instance, x=(pwd)
will save your working directory into x
.
Process substitution lets you pass command output into commands that expect a file instead of STDIN: cat <(ls) <(ls ..)
will concatenate the contents of the current directory with that of its parent directory.
2.4 Grab files names using patterns with globbing wildcards
Wildcard | Description |
---|---|
? |
Matches one position: x? matches x1 but not x10 . |
* |
Matches multiple positions: x* matches x1 and x10 . |
{} |
Perform expansion of all values. Use multiple for Cartesian product; e.g. echo {1,2}{3,4} returns 13 14 23 24 . Use {x..y..i} to go from x to y with increment i . |
2.5 Useful misc. shell commands
Command | Description | Example |
---|---|---|
spellcheck |
Analyze shell scripts for errors and bugs. | spellcheck script.sh |
tldr |
Get concise help pages (with examples!). | tldr [command] |
find |
Find files recursively in a folder. | find [dir] -name '[filename|*.ext]' |
find
can also process the results:
2.6 Alternatives to find
with extra functionality
Command | Benefit |
---|---|
fd |
Modernized implementation with simpler syntax. |
locate |
Faster lookup via file indexing (updated by updatedb ). |
grep |
Search the content of files. |
rg (ripgrep) |
Faster, modernized grep . |
rga |
Use rg on PDFs, and other documents. |
2.7 Use history
to avoid re-typing commands
Command | Description |
---|---|
history |
Returns all commands that you ran. |
history [n] |
Returns the last n commands. |
history | rg [search term] |
Searches through commands. |
<ctrl>-r |
Built-in search.<ctrl>-r to cycle through entries and <ctrl>-c to cancel the search. |
You can improve <ctrl>-r
by incorporating fzf
to add fuzzy matching. Learn how with this command:
Hopefully it makes sense to you based on what you’ve read so far!
3 Editors (Vim)
3.1 Vim operates in different modes
Mode | How to enter |
---|---|
NORMAL |
<esc> |
INSERT |
i |
REPLACE |
R |
VISUAL |
v |
V-LINE |
<shift>-v |
V-BLOCK |
<ctrl>-v |
COMMAND-LINE |
: |
3.2 Movement around using keyboard shortcuts
Command | Description |
---|---|
j/k |
Down/up. |
h/l |
Left/right. |
w |
Next word. |
b/e |
Beginning/end of word. |
0/$/^ |
Move to beginning/end/first non-empty character of line. |
gg/G |
Beginning/end of file. |
H/M/L |
Top/middle/bottom of screen. |
<ctrl>-u/d |
Page up/down. |
[number]G |
Go to line. |
% |
Go to corresponding parenthesis, bracket, etc. |
f/F |
Find a character forward/backward. Navigate back and forth with ,/; . |
t/T |
Like find but up until the character instead of on it. |
You can do a lot of the movement commands n times, e.g. [n]w
to go forward n words.
3.3 Edit text efficiently with commands
Command | Description |
---|---|
a |
Append to current position. |
A |
Append to end of current line. |
r |
Replace one character. |
R |
REPLACE mode. |
d |
Delete. |
c |
Delete movement and switch to insert mode. |
x/X |
Delete a character forwards/backwards. |
s |
Delete a character and switch to insert mode. |
S |
Delete a line and switch to insert mode. |
Editing commands can modified via movement commands, e.g. dw
will delete a word, and d[n]w
will delete n words.
Quickly duplicate a line with yy p
.
3.4 Modifiers select text within blocks
Modifier | Description |
---|---|
a |
Around. |
i |
Inner. |
Modifiers change the meaning of a command.
So ci(
will delete the contents inside a pair of parentheses, while ca(
will include the parentheses.
Modifiers can also be used in visual mode: vas
will select the current sentence, vap
the paragraph, and va(
the parenthetical block.
3.5 Find
Find things with /
followed by what you are looking for. Pressing <enter>
will jump you to the closest occurrence. n/N
cycle forwards/backwards through the results.
4 Data wrangling
4.1 Stream editing processes sequences of elements
There are a number of commands you can use to process streams:
Command | Description | Syntax |
---|---|---|
sed |
Process text in place using regex. Use s for text substitution and -E for modern syntax**. |
sed -E s/[pattern]/[replacment]/[flag] |
awk |
Process columnar data. Access the columns using $n . Using BEGIN and END enable stateful behavior. |
awk 'command' |
xargs |
Takes lines of inputs and turns them into command arguments. | [STDOUT] | xargs [command] |
Also worth mentioning:
sort
uniq
(requires a sorted list;-c
to include counts)paste
(-s
to concatenate columns and-d
to change the delimiter)head
/tail
bc
(calculator! Use with-l
to read STDIN).
4.2 Regex is a language to capture text patterns
Here are some common building blocks:
Expression | Description |
---|---|
. |
Any character. |
[...] |
A set of characters. |
[^...] |
The opposite of the set. |
(x|y|...) |
A set of strings. |
(...)* |
0 or more. |
(...)+ |
1 or more. |
(...)? |
0 or 1. |
(...){n} |
N times. |
(...){n,} |
N or more times. |
(...){n, m} |
N to m times. |
^ |
Anchor for beginning of string. |
$ |
Anchor for end of string. |
Adding ?
suffix to quantifiers (*
, +
, ?
, {n}
, {n,}
, {n, m}
) will toggle lazy matching which will find the leftmost match. Quantifiers are greedy by default, meaning they will match the rightmost character, which can lead to unintended matches.
Create a capture group around an expression using parentheses ()
. Then you can select it positionally in a replace statement using \n
to indicate the nth group:
5 Command-line environment
5.1 Manage multiple processes using job control
You can control processes via standardized POSIX signals4 that interrupt, pause, background start, etc. programs:
Shortcut | Description |
---|---|
<ctrl>-c |
SIGINT , interrupts the current process. |
<ctrl>-\ |
SIGQUIT , quits the process similar to SIGINT . |
<ctrl>-z |
SIGSTOP , suspends the current process. |
You can catch these signals to add functionality like saving program state when interrupted for instance.
There are also commands to manage unfinished processes:
Command | Description |
---|---|
jobs |
View unfinished jobs. -l to list PID (Process ID), -p only list PID, etc. (cf. tldr jobs ). |
ps |
Get a snapshot of processes. Similar to jobs . |
bg %n/PID |
Continue unfinished job in the background using %n position in jobs queue or PID. |
fg %n\PID |
Bring and resume a background process to the foreground. |
kill %n/PID |
Terminate job. |
nohup |
Continue job even if terminal session is closed by by ignoring SIGHUP hangup signal. |
disown %n/PID |
nohup already running jobs from current session. |
[process] & |
Start command in background (will print to STDOUT unless redirected). |
<ctrl>-z
followed by bg
will send the current process to execute in the background.
5.2 Multiplexers enable multitasking within a single session
tmux
, a popular multiplexer provides the following hierarchy:
- Sessions are an independent workspace.
- Windows are like tabs within a workspace.
- Panes are individual splits within a window.
Once a session begins (tmux new -s NAME
) commands are bound to <ctrl>-b
. This is commonly remapped to <ctrl>-a
.
Then you can create new windows and rearrange panes. Use <ctrl>-b + ?
for a list of options.
Also feel free to consult this tutorial and its followup on customization.
6 Version control (Git)
6.1 Git tracks data efficiently using hashing
Commits in Git come with a long alphanumeric string. This string is called a hash and it links each commit to the data it represents.
A hash is a unique string generated by the content of a file. If the file changes, so does the hash. Git uses this fact to store files efficiently: modified files get a new hash, while unedited files keep the same value.
Commits refer to the hash value of the root directory. The root directory itself contains the hashes of its contents, and so on. By storing hashes instead of data, Git avoids having to copy the same files over and over again for each commit. This hash-based data structure is called content-addressed storage (CAS).
Use git cat-file -p [commit hash]
to explore your commits!
6.2 Useful Git commands
Command | Description |
---|---|
git clone --depth=1 |
Clone a git repository without the commit history. |
git add -p [file] |
Select lines of code within a file to add (patch). |
git blame [file] |
See who and which commit authored each line of a file. |
git show [commit] |
Display what changed in each commit. |
Guide: how to write good commits.
7 Debugging
7.1 Loggers collect print statements for later analysis
“The most effective debugging tool is still careful thought, coupled with judiciously placed print statements” — Brian Kernighan, Unix for Beginners.
A logger will automatically save print statements to a file. It can capture anything from summary statistics about incoming data to error messages. You can set severity levels5 to particular log entries to enable filtering through data as needed.
Logs let you inspect runs to track and resolve issues over time. This strategy pays off as project complexity increases.
/var/logs/
contains system logs.
7.2 Debuggers help you fix code interactively
Debuggers are programs that let you step through code interactively so you can figure out what it is doing.
ipdb
6 is Python’s built-in pdb
with niceties such as colored output, tab completion and so on. You start it by running python -m ipdb [script]
.
These are some useful commands for pdb
/ipdb
:
Command | Description |
---|---|
l |
Lists +/- 5 lines around a line. Use l . for the current line, or l [start] [end] for an arbitrary range. |
ll |
Long lists all the code around the current line. |
s |
Run the current line and step to the next available line. This is could be inside of another function, so it’s not necessarily the next numerical line! |
n |
Execute the current line and stop at the next numerical line, jumping over any function calls. |
restart |
Re-run the program from the top. |
c |
Continue executing from the current line until the program halts, either due to hitting a breakpoint, completing, or crashing. |
p [var] |
Print the value of any variable in memory. |
b [line number] |
Add a breakpoint to the specified line. The debugger will halt execution when it hits this line. |
q |
Quit. |
General debuggers like gdb
work on any language and provide low-level information on hardware registers etc..
7.3 Static analyzers fix programming errors and formatting issues
Linters check code for logical issues such as referencing an undefined variable. They will typically not edit code but instead highlight the issue for the programmer to fix. Examples include pyflakes
, and flake8
.
Formatters standardize code and ensure compliance with style guides such as PEP. They usually edit the file for you. Examples include autopep8
, black
, and isort
.
ruff
7 is a new linter and formatter offering 10-100x faster performance.
8 Building and versioning
8.1 Build systems produce an output by managing dependencies
Builders manage the entire process to create a target file by recursively generating every dependency that is required. It’s a powerful idea that allows you to execute an entire workflow in a single command8.
The most popular builder is make
. It looks for a file called Makefile
which contains instructions on how to build each file in the process:
[target]: [dependencies]
[bash commands to produce the target based on dependencies]
[dependency]: [subdependencies]
[bash command to produce dependency based on subdependencies]
...
There are great tutorials on how to write a good makefile.
It’s a good idea to have make
manage the high-level workflow and delegate the creation of particular files to domain-specific builders. This prevents the makefile from becoming too complicated.
8.2 Semantic versioning enables dependency management
Semantic versioning is a popular form of program versioning that allows you to tell at a glance whether a version is compatible for your software:
version = major.minor.patch
patch
updates do not change the API.minor
updates add to the API and are backwards compatible.major
updates change the API in a non-backwards-compatible way.
This framework allows dependency managers ensure program stability by accepting minor updates, for instance, and rejecting major updates until the code is refactored.
Python uses semantic versioning. It is why Python 2 scripts are not compatible with Python 3, and vice-versa.
Footnotes
https://www.gnu.org/software/bash/manual/html_node/Single-Quotes.html↩︎
Obligatory: https://xkcd.com/149/.↩︎
Look up
man signal
for a comprehensive list of every signal.↩︎For instance, you could build a pipeline to run an experiment and automatically report the results in a LaTeX-rendered report, or to build, test, and publish a programming package, or to render and ship a website.. the possibilities are endless!↩︎