Last updated April 2024

by Jason D. Josephson

For errata/feedback: jasonjosephson602@gmail.com

# Fundamentals of Python

Before doing much with Python, we'll cover the basics of how to write simple code. Python code is designed to be intuitive to read and write, so this won't be much of a challenge.

**NOTE**: You cannot learn to write Python just by reading a book/tutorial. You need to actually use it yourself. I BEG you to do the problems at the end of each section. And you're encouraged to think of any other projects you can do with Python, even if very small, to gain experience.

## I. Data types, variables, and built-in functions






### i. Data types and operators




First, let's add 1+1.

In [None]:
1+1

Python returned the answer. Now let's try the same thing, but with quotation marks around each '1'.

In [None]:
'1'+'1'

Python interpreted the two lines of code very differently, because `1` and `'1'` signified different **data types**. We gave Python totally different kinds of data to work with, and it interpreted our code accordingly. The lesson: ensuring your data is the right type is important.

Common data types are:

*   Strings
*   Integers (ints)
*   Floating point numbers (floats)
*   Booleans (bools)

Strings represent series of characters, e.g., letters, numbers, symbols, and spaces. Strings always go between quotation marks. You can use single (') or double (") quotes, but you can't mix them together around a single string (e.g., `'a'` and `"a"` are fine, but not `'a"`).


In [None]:
'abc123 !@#$%^&*()'

The `+` operator can be used concatenate strings, i.e. join them together.

In [None]:
'a'+'b'+'c'

Integers (ints) are just integers. The `+` operator adds ints in the usual way.

In [None]:
2023

In [None]:
10+23

No quotes go around ints. If you put quotes around a number, it becomes a string, not an int. That's why `1 + 1` gave us 2 while `'1'+'1'` concatenated the two strings to form the string `'11'`. Python did the work of interpreting the `+` operator in two very different ways, depending on how we structured our code.

What happens if we mix `'1'` and `1`?

In [None]:
'1'+1

We get an error. When you receive an error message, the line which caused the error and the type of error will be shown, usually along with some kind of specific message. In this case, it was a `TypeError`, since we were trying to use the `+` operator with incompatible data *types*.

Let's go back to ints. You can do arithmetic with other operators as well:

In [None]:
3-1

In [None]:
3*2

In [None]:
10/5

Be careful: exponents are indicated with `**`, **NOT** `^`.

In [None]:
2**3

Order of operations is followed. Parentheses can be used. You can also add spaces between operators, ints, strings, etc. to make the code look nicer/easier to read; it has no effect on the output.

In [None]:
(1+3)*2**3+1

In [None]:
(1 + 3) * 2**3 + 1

The notion of code being easy to read (readable) by a human reader may seem unimportant. In fact, it is one of the most important goals in programming. Computing may be done by the machine, but programming is done by people. The more difficult reading your code is, the more difficult it is to use, modify, apply, and create from it.

On this subject, then, we will introduce **inline comments**. These comments are written within the code and online exist so that humans can read them; they are ignored by the Python interpreter. If you write the character `#`, all subsequent writing on that line will be part of a comment.

In [None]:
# nothing will happen when you run this, because it's just a comment

In [None]:
# python will ignore this line and move on to the next one
'ABC'

In [None]:
'ABC' # you can even put a comment midway through a line

Comments are an important form of code **documentation**. You can write down what a line or block of code does or explain something that might not be clear to a human reading the code. Very often *you* might forget what some seldom-used block of your own code does, so even if nobody else will see it, it's good practice to write comments often.

For better or for worse, coding is a human art. Comments and documentation may only seem like a footnote to the "actual" code, but they are of profound importance.

Floats are numbers with decimal points. Floats and ints are different data types. (Think about the difference in memory used to store `2` vs. `2.000000000000`.) But arithmetic with floats works the same way.

In [None]:
2.4 - 2.1 * 2.0

That little 3 at the end is an artifact of how the machine does math. Sometimes it may cause issues; we'll soon learn how to round floats to the nearest int or a given decimal place. But if you see a result like the above, use common sense and recognize that the answer is -1.8.

If you try to use a mix of ints and floats, Python can handle this, but the number returned will always be a float.

In [None]:
2 + 1.0

Here are some other operators: `n // m` will divide n by m, then round the answer down to the nearest integer (and return an int).

In [None]:
5//2

`n % m` (% pronounced "modulo", or "mod") will divide n by m and return the remainder (an int).

In [None]:
12 % 5

Bools are data with two values: `True` and `False` (`0` and `1`). The name is from mathematician [George Boole](https://en.wikipedia.org/wiki/George_Boole); today, Boolean algebra is (as far as I know) the basic foundation of mathematics upon which higher structures are built.

Here's another operator: `n == m` means that n equals m. If you just write this expression, you'll get a bool as the result.

In [None]:
2 + 2 == 4

In [None]:
3 - 1 == 1002344

This operator can be used for more than just ints and floats, too.

In [None]:
'abc' == 23

In [None]:
'a' + 'b' == 'ab'

`!=` means 'not equal to'. Much more convenient than having to find a "≠" symbol.

In [None]:
2 + 2 != 4

Inequality operators (`<`, `>`) can also be used. `>=` is used for 'greater than or equal to', and correspondingly for `<=`.

In [None]:
3 > 4

In [None]:
-34.1 < 0 + 20

In [None]:
23 >= 23

In [None]:
322 <= 30000/10

Python also has some operators in the form of words, such as `not`, `and`, and `or`, the classic operators in Boolean algebra. If you know basic logic, they behave as expected. [Here is a guide to Boolean algebra if you're interested.](https://www.geeksforgeeks.org/boolean-algebra/)

In [None]:
2 + 2 == 4 and 1 + 1 == 2

In [None]:
2 + 2 == 5 and 1 + 1 == 2

In [None]:
2 + 2 == 5 or 1 + 1 == 2

In [None]:
2 + 2 == 5 or 1 + 1 == 3

In [None]:
not 2 + 2 == 5

WARNING: Unless a logical statement is very simple, it's best to use parentheses to avoid confusion. Note that these two statements evaluate to different bools:

In [None]:
not (2 + 2 == 5 or 1 + 1 == 2)

In [None]:
not 2 + 2 == 5 or 1 + 1 == 2

Evaluating manually, the first case solves to:

`not (2 + 2 == 5 or 1 + 1 == 2)` (evaluate `==` statements)

`not (False or True)` (substitute `False or True` ⇒ `True`)

`not (True)` (not T ⇒ F)

`False`

The second is:

`not 2 + 2 == 5 or 1 + 1 == 2` (evaluate)

`not False or True` (not F ⇒ T)

`True or True` (T or T ⇒ T)

`True`

Also, note that `or` means "one or the other *or both*". "Exclusive or," sometimes written as "xor" and meaning "one or the other but not both", is not an operator. But it is equivalent to `(x or y) and not (x and y)`.

In [None]:
(True or False) and not (True and False) # only one of x and y is true

In [None]:
(True and True) and not (True and True) # both x and y are true

Here's one more operator for now: `in`. When used with strings, `n in m` will look for whether string `m` contains string `n`, i.e., whether `n` is a "substring" of `m`.

In [None]:
'Hello wor' in 'Hello world'

In [None]:
'hello' in 'Hello world'

As seen above, don't forget that strings are case sensitive. `'A'` and `'a'` are simply different pieces of data. If you want to make something case-insensitive, you'll need to design your code accordingly. (We'll see how later.)

#### **Exercises**

1. Evaluate $(2-\frac{3}{5})^{(2.3-1)}$.

2. Evaluate $\lfloor\frac{5-2}{2}\rfloor$. The unusual brackets indicate the [floor function](https://en.wikipedia.org/wiki/Floor_and_ceiling_functions).

3. Write a single line of code that returns the bool `True` if neither "$12.3^{38} \le 11^{37}$" nor the statement "`'abc'` is a substring of `'spaac'`" is true, and otherwise returns `False`.

4. a. Re-write (that is, write a logically equivalent statement to) the expression `not ('x' or 'y')` using only the `not` and `and` operators (and the strings, of course, and you can use parentheses).

  b. Re-write `not ('x' and 'y')` using only the `not` and `or` operators (and the strings, of course, and you can use parentheses).

  These equivalences are instances of De Morgan's laws.

### ii. Built-in functions and variables


 A **function** is a block of code that you can "execute," i.e., run, on command. To "call" the function and have it execute, write the function's name followed by parentheses, e.g., `function()`. Many functions accept inputs as well; these are called "arguments" and are put in between the parentheses after the function name, e.g., `function(argument1, argument2)`.

Python has functions built-in, i.e., pre-made functions that you don't need to get from anywhere else. The `print()` function prints whatever you give as an argument. By "prints", we mean, roughly, that it displays the argument for the user to see. Multiple arguments can be "passed" (given), just separate them with commas.

In [None]:
print('abc')

In [None]:
print(2.345)

In [None]:
print('A', 2, -234.5, False)

The `type()` function returns the data type of an argument.

In [None]:
type('encomium')

In [None]:
type(2 == 3)

In the latter example, notice that Python first evaluated the expression `2 == 3` and then executed the function. That is, `2 == 3` gave the bool `False`, and `False` was then passed as the argument to the `type()` function.

In [None]:
print(type(1 + 1))
print(type(1 + 1.0))

In both of these cases, the addition was first evaluated, then the result was passed to `type()`, then *that* result was passed to `print()`.

But the text produced by printing the result from `type()` is different compared to the text displayed when using `type()` without `print()`. Why? We'll get to that in a moment.

`round()` rounds a float to the nearest int. A second argument can optionally be used to round to that number of decimal points. But if no second argument is given, the function defaults to rounding to the nearest int. When rounding, if the relevant digit is < 5, it rounds down; if >= 5, it rounds up.

In [None]:
print(round(3.4556))
print(round(23.5))
print(round(2.345676543, 2))
print(round(10/3, 1))

As you know, `==` is the 'equals' operator. So what is `=`? Just one equals sign is used to **assign variables**. Variables are alphanumeric (though they can't start with a number), and you don't put quotes around them. Underscores can be in them as well, but not spaces.

In [None]:
x = 2
print(x)

In [None]:
xyz = 'Hello world'
abc = '!'

print(xyz + abc)

In [None]:
distance_1 = 1.34
distance_2 = 2.20

sqr = (distance_1 - distance_2)**2

print(round(sqr, 2), 'Angstroms^2')

In [None]:
x = 345
print(type(x))

Note that variables can be overwritten!

In [None]:
x = 2
print("At first, x is equal to", x)
x = 1
print("But now, it has been overwritten to", x)

Naming variables is another apparently trivial task that is quite important. The computer doesn't care what the variable name is (so long as it's valid), but the human reading your code does. For long programs with many variables, keeping track of what's what is important. Along with inline comments, variable names are an important way of doing this. Don't name a variable something random or not easily understood. It's tempting after a long day or when you're in a rush to write `asdf = 1.4` instead of `CC_bond_length = 1.4`, but the latter is infinitely easier to understand when you or someone else has to go back and read it.

Practical tips for naming variables: (1) Be descriptive, as per above; (2) Don't make the names too long, or else the lines they're in will be too long; (3) Make them easy to read.

(1) and (2) can be at odds with each other, so try to balance them. `numberofcarboncarbonbonds` could be shortened to `numccbonds` without losing much info. (3) is usually achieved by underscores between words and/or "camel case," which means capitalizing the first letter of each word aside from the first word: `apple_apple_apple` or `appleAppleApple`, respectively. Or for our last example, `num_cc_bonds` or `numCCBonds`.

Lastly, note that variables can even store more exotic objects, as well.

("Object" is actually a technical term here. An object is basically anything in Python: strings, ints, floats, bools, functions, variables, etc. etc. More on this later.)

In [None]:
x = print

x('Hello world')

Now, let's go back to the issue from before. Contrast the texts displayed when the following lines are run:

In [None]:
type(True)

In [None]:
print(type(True))

In the first line, we simply "returned" the output of the function. In the second line, we did not. When an object is returned, it's as if the object is being given. We call `type(True)` and it gives us an object, spits out an object. This lets us assign a variable to the output object.

In [None]:
output = type(True)
print(output)

`type(True)` returned an object, which was assigned to the variable `output`. But the `print()` function doesn't do the same. `print()` doesn't return an object; it just displays text to the user. Let's try to assign a variable to the output of `print(True)`:

In [None]:
output = print(True)

The `print()` function printed "True" because it was called. But was anything assigned to the `output` variable?

In [None]:
print(output)

Nope. So Python assigned it to the `None` data type, because there was nothing for `output` to be assigned to.

A block of code being executed as a whole can only return one object. See what happens when we try to return two:

In [None]:
type(1)
type(True)

Only the object returned by the bottom line was ultimately returned. The `type(1)` function was executed and returned its output, but nothing was done with this output, so Python continued on to the next line. The next line being the final line, its output was returned as *the* output of the code as a whole.

What if we want to see both of these outputs, or more? Do we have to execute separate blocks of code each time? No, we can just use `print()`.

In [None]:
print(type(1))
print(type(True))
print(type('dft'))

We'll move on; remember the difference between `print()`ing something and returning it.

The `len()` function returns the length of a string.

In [None]:
len('Eudoxus')

The `str()`, `int()`, `float()`, and `bool()` functions allow you to interconvert data types. This will only work in feasible cases; you can't turn `'apple'` into a float.

In [None]:
x = '24'
print(x, type(x))

x = int(x)
print(x, type(x))

x = float(x)
print(x, type(x))

Converting floats to ints may force the function to round, and it always rounds *down*, i.e., it uses the floor function.

In [None]:
x = 23.1
print(int(x))

x = 23.999
print(int(x))

Converting to bool is interesting. Empty strings `''` give `False`, but any other strings, even `'False'`, yield `True`. Ints and floats give `True`, except 0.

In [None]:
print(bool('blah blah'))
print(bool(''))
print(bool('False'))
print(bool(-12))
print(bool(24.2))
print(bool(0))

#### **Exercises**

1. Suppose you're writing a program in which the user enters their name, and this string is assigned to the variable `name`. Write a line in which the message "Nice to meet you, {name}!" is printed, where {name} is the user's name. We've assigned the `name` variable with an example name.

In [None]:
name = 'Jesse'
# erase this line and write your code here

2. Without adding or removing quotes, and without using the `//` operator, edit the following line to correctly evaluate the expression $\lfloor2(1.9)\rfloor$:

In [None]:
'2' * 1.9

### iii. User-defined functions

A famous programming maxim is **"Don't repeat yourself."** Remember, calling a function runs a pre-written chunk of code. If you find yourself writing the same block of code in multiple places, you usually want to avoid this by writing your own function, then calling the function in multiple places. This is shorter, easier to write, and usually easier to read.

The syntax to write a function is simply `def function(argument1, argument2, ...):`, followed by an indented code block to be executed when the function is called.

In [None]:
def say_hi():
  print('Hello!')   # the indented code block is not executed yet

In [None]:
say_hi()            # only when you call the function!

Like variables, functions can be overwritten:

In [None]:
def say_hi():
  print('Hi!')

say_hi()

Note that after the second line in the cell above, the next (non-blank) line wasn't indented, so it wasn't a part of the function's definition.

Let's make a function with an argument. This function takes a float/int, adds the number 1, then prints the sum. In other words, it's equivalent to $f(x) = x + 1$.

In [None]:
def addOne(x):
  print(x + 1)


addOne(2)
addOne(-2345)
addOne(-2.0)

But right now, `addOne()` only prints the sum. If we want to assign a variable to the result of the function, this won't work; recall the distinction between printing something and returning it. We need the function to `return` the sum:

In [None]:
def addOneMk2(x):
  return x + 1

x = addOneMk2(3)
print(x)

We see that `x` was assigned to the sum. Of course, there's nothing stopping us from both printing and returning the sum.

In [None]:
def addOneMk3(x):
  sum = x + 1
  print(sum)
  return sum

x = addOneMk3(3) # the function will print the sum when run
print('The variable x was assigned to:', x)

So if you want to see what's going on while a really long function is running, then you can put `print` statements in the middle of the function.

Note that when a function reaches a `return` statement, it stops completely at that line.

In [None]:
def return_stop():
  print(1)
  return 'asdf'  # function stops here, returning the string
  print(2)       # print(2) will be ignored completely

return_stop()

#### **Exercises**

1. Create the Python equivalents of the mathematical functions:

   $f(x) = x^x$

   $f(x,y) = x(y-3)^{-1}$

   $f(x) = \sum_{n=0}^{5} x^n$

2. As it stands, the code below gives an error. Correct it by changing only one line within the `universalGravity()` function.

In [None]:
def universalGravity(m1, m2, r):

  # Newton's law of universal gravitation
  # F = -G*m1*m2/r^2
  # m1 and m2 must be in kg
  # r must be in m
  # G in N m^2 kg^-2

  G = 6.67 * 10**-11
  F = -G * m1 * m2 / r**2
  print(F)


force_newtons = universalGravity(1, 2, 1)   # force in N
force_lbf = force_newtons * 0.224809        # convert N to lbf
print(force_lbf)

### iv. String methods

**Methods** are functions which are applied to a particular object. Methods do something to the given object, or return information about that object.

Strings have a number of methods which are useful and simple. To use a method on a string, the syntax is: `'string'.method()`. The `lower()` method converts a string into all lower-case letters:

In [None]:
'SPQR'.lower()

In [None]:
'toCsY'.lower()

I hope you can guess what `upper()` does.

In [None]:
'dft'.upper()

`replace()` replaces all occurences of a substring by another string. The first argument is the substring to be replaced; the second is the string to replace with.

In [None]:
'Tat tvam asi'.replace(' ','_')

`replace()` can also remove substrings, like so:

In [None]:
'PDB ID: 6NY9'.replace('PDB ID: ','')

`strip()` removes **whitespace** characters (spaces, tabs, etc.) from a string on both sides of the string, but not in the middle of the string. It's easier to demonstrate than describe:

In [None]:
string = '     Nurture yourself on emptiness and float.           '
string

In [None]:
string.strip()

(To see the difference, notice the positions of the quotation marks when the strings are returned above.)

`lstrip()` and `rstrip()` remove from the left and right sides only, respectively.

In [None]:
string.lstrip()

In [None]:
string.rstrip()

`title()` converts all first letters of a string to capitals, and other letters to lower case.

In [None]:
'red soRGhum'.title()

Other methods get information about a string instead of modifying the string. `count()` returns the number of times a substring occurs in a string. The substring is inserted between the parentheses:

In [None]:
'All this, so great, and sprung from that greatness.'.count('great')

`index()` searches for a substring in a string and returns the position in the string where the substring is found.

In [None]:
'abc'.index('b')

`'b'` was in position 2, so why did we get 1? In programming, we often start counting from 0 instead of 1. Thus, the 1st position in a string is position 0, the 2nd, position 1, etc. If the substring has multiple characters, the index returned is that where the substring starts.

In [None]:
'abc'.index('bc')

`index()` will only return the position of the *first* match it finds, so be careful.

In [None]:
'I am/Yet what I am/None cares or knows'.index('a')

`startswith()` and `endswith()` return bools based on whether the string starts/ends with a substring.

In [None]:
'All things excellent are as difficult as they are rare.'.startswith('All ')

In [None]:
'Hao jiu bu jian'.endswith('ian')

`isalpha()`, `isnumeric()`, and `isalnum()` return bools if all the characters are alphabetical, numeric, or either, respectively.

In [None]:
'atcgtggactagcatcag'.isalpha()

In [None]:
'12'.isnumeric()

In [None]:
'6ny9'.isalnum()

Just be careful about whitespaces, since these aren't alphabetic characters. Also, `-` and `.` aren't numeric characters.

You can stack methods on top of each other, too. For instance, `'ABC'.lower()` will return a string (`'abc'`). But we can use another method on `'abc'`, so we can use another method on `'ABC'.lower()`. That is, we can do this directly in one step:

In [None]:
'ABC'.lower().isalpha()

This is neat, but should you do it? Will it make your code more readable? What benefit(s) does this have? Again, think about the case at hand and ask these questions, because more condensed code isn't always better.

#### **Exercises**

1. Suppose you are writing a program in which a student enters the name of their faculty (Science, Arts, etc.). Create a function that takes this string as an argument to do the following (and then call it to ensure it works!):

  a. Print a message stating "Your faculty is the Faculty of " followed by the faculty.

  b. Add a new line of code before your print statement. In this line, process the user's input string by removing whitespace to the left and right of the letters, and also capitalize the first letter of every word (and make all other letters lowercase). Then print the processed string.

  c. Add another new line of code for further processing. In this line, remove all occurences of the substring `'Faculty Of'`. Then remove whitespace again. Why might we want to do this? Should this line go before or after the line added in b.? Why did we remove whitespace a second time?

  d. Write a single line of code in which a., b., and c. are all performed. Is this preferable to using multiple lines?

### v. Escape characters and raw strings

Lastly, there are some important things you should know about strings. Many characters are encoded into strings in a special way. For instance, consider a "line break." How does a word processor, text editor (e.g., Notepad), or even the program you're using now know when to make a new line? Sometimes, a new line is made because the text is too long to fit on the screen; this is called "word wrapping." But this isn't what we mean here. We're interested in line breaks added deliberately by the writer, e.g., between two paragraphs. What's the difference between:

"A B C"

and

"A

B

C"

On a keyboard, of course, we hit "enter" to make a new line, but how do we encode this info into a string?

In [None]:
print('a
b
c')

Clearly, we can't just hit enter in the middle of our strings, or we get an error. In fact, we use `\n` ("n" for "new line") to indicate line breaks:

In [None]:
print('ABC')
print('A\nB\nC')

`'\n'` is a whitespace character, like `' '`. Another is `'\t'`, used for a tab.

In [None]:
print('A\tBC')

`\'` and `\"` make the quotation mark a part of the string, rather than signifying the end of the string.

In [None]:
print('I said, \'Hello!\'')
print("\"Hi,\" he replied.")

There are more "escape" characters like this, but we don't need to worry about them. They take the same form of a backslash followed by one or more characters. But there's an obvious problem: what if we actually want to use a backslash? One option is to use `'\\'`:

In [None]:
print('A double backslash encodes a single backslash: \\')

You can also use "raw" strings. In a raw string, a backslash is just a backslash, so you can't use escape characters. Indicate a raw string by putting an "r" in front of the first quotation mark for the string: `r'string_goes_here'`.

In [None]:
print('A\nB\tC\\D')
print(r'A\nB\tC\\D')

#### **Exercises**

1. The data in the table below has been turned into a string and assigned to the `string` variable below, then printed. Format the string by adding various `\n` and `\t` characters to the string. There should be a tab between columns and a line break between rows.

Element Name| Atomic Number
------------|------------------
Hydrogen    | 1
Nitrogen    | 7
Scandium    | 21
Lanthanum   | 57

In [None]:
string = 'Element NameAtomic NumberHydrogen1Nitrogen7Scandium21Lanthanum57'
print(string)