A115 Software Engineering

Bespoke cloud-based software platforms powering UK commerce since 2010

Re-introduction to Python - part 3. Exploring dictionaries - the most robust data structure this language supports.

So far we've learned about some basic data types (numbers, strings, booleans) and some not-so-basic ones for representing collections of items (lists, tuples, sets). Now it is time to explore a more complex - but very commonly used in Python - data structure for managing collections of items.

Before we do, let's write a quick geography programme to model the world (it is good to be ambitious, even when you're just starting). To begin with, let's create a list, containing the countries of Europe:

europe_countries = ['Albania', 'Andorra', 'Armenia', 'Austria', ]

Pretend this is actually a complete list of all 50 or so sovereign territories on the European continent. Let's say we also have similar lists for Asia, Africa, and the other continents.

Now imagine the user of our Python programme enters the name of any country in the world and we want our programme to return the name of the continent in which that country is located.

Lists initially seemed like a good way to model our world, but we may be starting to have some doubts now. To do the lookup the user is asking about, our programme would have to basically check every item of every continent list. That's not very convenient or efficient.

Luckily, this new amazing data structure I've been going on about is highly fine-tuned just for tasks like these. In fact, it's called a map! It's a data type which maps a set of items (like the names of countries) onto a collection of other items (like the names of continents). The first set is called the "keys" and the other collection is called the "values". In Python this data type is more commonly known as a "dictionary" (or just dict). Think of looking up the translation of a word from one language to another. The dictionary data type is designed to make it as fast as possible for the computer to lookup the value for any given key.

We can now model our world more efficiently using a Python dictionary:

world_map = {
  'Afghanistan': 'Asia', 
  'Albania': 'Europe', 
  'Algeria': 'Africa', 
  'Andorra': 'Europe',

(again, pretend we have the whole world mapped here.)

Notice how a Python dict is defined with notation similar to the one we use for sets (the curly brackets). This is because the keys (in this case the names of countries) are always a set. Remember how sets only contain unique items and can not have duplicates? The same is true for the keys of a dictionary. You can't have two different mappings for the same thing in a dictionary.

But notice also how the notation is different from that of sets. The commas here separate pairs of key-value mappings. The colon sign : is used to indicate that a key (e.g. 'Afghanistan') maps to a value ('Asia').

We can easily lookup what continent a country belong to by "indexing" our dictionary directly with the country name, like so:

continent = world_map['Andorra']

And our entire programme can now be expressed rather concisely in this way:

country = input()
continent = world_map[country]

All of this is wonderful so far, but programming is often about handling special cases or applying different data transformations under different conditions. The way we instruct the computer what to do under certain conditions is by using so-called "conditional" statements. The most common example of such a statement in Python is the if statement. It allows us to instruct the computer to take certain actions only if a certain condition is met.

Let's make this a bit more tangible. Imagine our client wishes to have a programme, which prints out the name of the continent a country belongs to. But they also want the programme to print the word "WOW" if the name of the country begins with the same letter as the name of its continent AND ALSO ends on the same letter as the name of its continent.

Using the if statement and some basic Boolean logic, we can now add a condition to our programme like this:

if (country[0] == continent[0]) and (country[-1] == continent[-1]):

Note that now there are two possible things that can happen when we run our programme (and it all depends on the name of the country the user inputs) - either the programme will print "WOW" or it won't. In programming we call this "branching". Every conditional statement we use increases the number of possible branches our programme can take when executing. This increases the complexity of our programme and makes it more difficult to test and to reason about. So we don't want to go overboard with conditionals - we try to use no more than what is necessary to get the job done.

The conditional expression following the if keyword can be a very simple boolean value, or a very complex boolean logic expression, with multiple ands, ors, nots, and so on. Once again, it is worth striving for simplicity here, because the more complex that expression gets, the more difficult our code is to read and to test.

Finally, it's useful to know about a shorthand notation for if, which comes very handy for short, one-line expressions that can evaluate to either of two possibilities. It looks like this:

continent = world_map[country] if country in world_map else None

If the condition (the part between the if and the else) evaluates to True, you get the value before the if, otherwise you get the value after the else.

The last example was a good wat to illustrate the usage of the in-line if statement, but there's actually a nicer way to achieve the same result. Python dictionaries have a .get() method, which only returns the value if the key is present - otherwise returns None:

continent = world_map.get(country)

So, .get() is a safer way of looking up items in a dictionary, compared to direct indexing with [key], because world_map['Absurdistan'] will break our programme, but world_map.get('Absurdistan') will not. If you want the look-up to return a different default value for the cases when the key isn't found in the dictionary, you can provide a second argument to .get() like this:

continent = world_map.get(country, "Country not found.")

Here are some questions and topics for further research:

  1. What are the various operations that can be performed on dictionaries and their elements?
  2. What are the limitations on what kinds of values can be used as keys in a dictionary?
  3. We saw one way of defining a new dictionary - using the literal expression d = {"key1": "value1", "key2": "value2", "key3": "etc."}. There is another way to do the same. How can you use the name of the data type (dict) to create a new dictionary?
  4. Can a single if statement create more than two branches in a programme?
  5. EXERCISE: Modify your super-power character programme to include a map of numeric values (between 1 and 100) for each super-power. For example, you may value the ability to become invisible more than the ability to fly, so you can assign invisibility a value of 60 and flying a value of 35. Ask the user to enter a superpower and output "COOL" if its value is more than 50, or "SUPER COOL" if it is more than 80.
  6. EXERCISE: Give your character a wallet. Use the right data structure to make the wallet contain balances for a number of different currencies. For example, you might want to give your character 50 GPB and 170 USD. Maybe some Euros as well, for when your character gets double vaccinated and travel to Europe becomes a thing again.