Unix philosophy and writing scripts
The Unix philosophy is a design philosophy for programming and scripting. I will list its rules here, but one can do more reading themselves on this if you want. The reason the Unix philosophy is even relevant today, is because it makes the standard scripting experience on any GNU/Linux system(Linux being a Kernel, GNU standing for GNU's Not Unix, and both are inspired heavily by Unix).
- Rule 1: Modularity. Write simple programs with clean interfaces
- Rule 2: Clarity. Don't be a smartass
- Rule 3: Composition. Write programs that can be composed into bigger programs
- Rule 4: Seperation. Seperate interfaces from their engines.
- Rule 5: Simplicity. Add complexity only where needed(Complex algorithms or data structures are often buggy)
- Rule 6: Parsimony. Write a big program when it's clear it can't be decomposed cleanly(Relevant for highly coupled programs like Games)
- Rule 7: Transparency. Design with visibility in mind. Don't needlessly obfuscate.
- Rule 8: Robustness. Logically follows from simplcity and transparency
- Rule 9: Representation. Make the data "smart" not the program. Smarter programs are buggier
- Rule 10: Least surprise. It should make sense
- Rule 11: Silence. It shouldn't say anything if there's nothing noteworthy
- Rule 12: Repair. Fail loudly and quickly
- Rule 13: Economy of time. Programmer time is costly, Machine time isn't(real-time systems often drop this as real-time requires high performance)
- Rule 14: Generation. Write programs to write programs so constant involvement isn't necessary
- Rule 15: Optimization. Prototype before polishing, how else can you know what to optimize?
- Rule 16: Diversity. There is no true way, only true frauds
- Rule 17: Extensibility. Make programs easily extended in future
Some of these are obvious. Some not so obvious and some are rather specific. It's worth noting that these principles can be applied to systems that aren't Unix-Like but it may be harder. Anyway, the focus of this blog will be on simplicity, clarity and modularity. I will discuss how to write a decent shell script.
Writing shell scripts
Your shell scripts will begin with the line
#!/usr/bin/sh
What this shebang does, is it tells the script, which shell to run. sh, is usually symbolically linked to bash, but maybe to some other shell like zsh or fish. It's good practice to use sh rather than bash unless you use bash features in which case you would use the shebang #!/usr/bin/bash. Other shebangs are available for other scripting languages, but today I will focus on shell scripting.
To help demonstrate this, I will be using my blog writing script to show how I write a shell script.
# Variables blogdir="content/blog/"; rssfile="../html/content/rss.xml" year="2021"
Lines beginning with a hashtag are comments. Lines with a word followed by an equals(NO SPACE BETWEEN THE =) will be a variable. You will observe I used a semi-colon. In shell scripting it's good practice to end lines of code with a semi colon. There are cases where it has a useful meaning, but I won't discuss them here.
echo "Enter a blog title"; read -r blogtitle; blogtitle=`echo "$blogtitle" | sed "s/ /_/g"` blogtitle="$blogtitle.gmi" st -e nvim "$blogdir$year/$blogtitle";
Here I call the program echo. echo can take a number of flags, but I don't use them. You can look these flags up in a terminal by typing man echo. For this, in this line, I echo to the user to enter a title, then I call the program read, with the -r flag and a variable name that will be used. This allows me to get user input straight from the terminal. I could use dmenu or some other program if I wanted to, but I chose not to.
The second line introduces three new concepts, a pipe and the backticks `. I will cover pipes and variables first.
echo $blogtitle; | sed "s/ /_/g"
At first, I call echo. The $ means a variable. You can then type a name after the $ to mean a specific variable, in this case blogtitle, which was previously set by our read. The standard output of this program is piped into sed. Piping is a way of passing the output of a program as the input to another program. The program sed is a stream editor, so it allows me to modify streams of data and output them. The string I pass into it is how I want to modify it, it's pretty similar to Regex.
"s/ /_/g". The s means substitute. the / means to go to the next part of the parameter. So I will be substituting spaces. The 2nd / means to look at what I will substitute it with, _. the final / and character tells me I want to apply it globally to the stream, and not for the first instance. As a result, this will replace all space characters with underscores(useful for writing files without pesky spaces!). sed is a pretty useful and complex program, look at the man page by typing man sed to find out what it does.
Just from this small demonstration you can begin to see how the rules of composition, modularity and seperation are all being used here. Each of these programs handles their own logic as a black box, and we care only for the output to compose the outputs together as a program.
Now lets look at the backticks. What this does, is allow us to evaluate the contents in the backticks, and assign its output to a variable. As such, the blogtitle has been assigned to be itself but with spaces substituted for underscores.
We can then assign it to have a file extension by just using the variable in a string, as "$blogtitle.gmi". I then call st which is my terminal emulator to create a new terminal with neovim so I can write the article. A note about quotation marks. backticks(`) are for evaluating commandline expressions. Strings(") are where you can use variables and text and assign it accordingly, and single quotes (') do not allow the use of variables, and are pure text. These 3 quotation marks make it relatively easy to use variables appropiately.
I won't go through in detail the rest, but you can get the idea of how it is useful. I will write the rest of the program below and point out other noteworthy lines.
#!/bin/sh # Variables blogdir="content/blog/"; rssfile="../html/content/rss.xml" year="2021" # Read the name of the blog and make a file and write it echo "Enter a blog title"; read -r blogtitle; blogtitle=`echo "$blogtitle" | sed "s/ /_/g"` blogtitle="$blogtitle.gmi" st -e nvim "$blogdir$year/$blogtitle"; # Recreate the blog directory yourTitle=`head "$blogdir$year/$blogtitle" -n 1 | sed "s/# //g"` head "$blogdir/blog.gmi" -n 6 > "head.t" echo "=>$year/$blogtitle $yourTitle" > "link.t" tail "$blogdir/blog.gmi" -n +7 > "tail.t" cat "head.t" "link.t" "tail.t" > $blogdir/blog.gmi # Now we set it up for RSS cat "$blogdir$year/$blogtitle" | sed -z "s/\n/<br \/>/g" > "rss.txt" echo "]]></description></item>" >> "rss.txt" echo "<item><title>$yourTitle</title><description><![CDATA[" > "rssA.txt" cat "rss.txt" >> "rssA.txt" rm "rss.txt"; mv "rssA.txt" "rss.txt" head -n 10 "$rssfile" > "head.t" tail -n +10 "$rssfile" > "tail.t" cat "head.t" "rss.txt" "tail.t" > "$rssfile" # Clean up temp files rm "head.t" "link.t" "tail.t" "rss.txt"
The comments for it explain it well. The > character means to write to a file. This overwrites everything in it. The >> characters means to append to a file so it writes it at the end. All of the above is me writing to temporary files and composing a structure from which to use and then putting them back together.
You will observe a number of programs not mentioned before. Firstly is cat. cat takes a number of flags, but it allows me to just output the contents of a file. Some programs have flags for taking file input though.
rm, removes files or directories, useful for deleting these temporary files. mv moves a file or directory to a new location or just renames them. head takes the first n lines of a file. tail takes the last n lines of a file.
All the above programs can be read and understood better by READING THE MANUAL. There is a co mon saying called RTFM which stands or Read the fucking manual. Quite simply, a lot of the flags you need for a program are listed there.
This has been a brief look at shell scripting. If you understand variable assignment, command execution and the programs you use in it, it's very easy to compose complex nuanced scripts and behaviours from simple and dumb programs.
Thanks for reading!
P.S. A lot of the Unix Philosophy is mainly relevat to writing small C programs to be used in shell scripts as I've demonstrated above. This is just us taking advantage of a system using the unix philosophy. As such a system like Windows can't take advantage of these ideas as much because it wasn't designed with this philosophy in mind. Also for large or highly coupled programs it also tends to fall apart, so it's not always applicable.
And remember, there's never one true way to write scripts or programs universally. It's up to you to determine the best way from your current knowledge and experience.