Python data structure

Python

Python basics

Author

Chi Zhang

Published

December 2, 2024

Resources: Python Data Science Handbook by Jake VanderPlas

Strings

print('Winter' + 'Park')
WinterPark

Check data types

movie = 'Matrix'
print(type(movie))
<class 'str'>
channel = 'cnn'
print(channel.upper())  # CNN
print(channel.capitalize()) # Cnn
print(channel.find('c')) # first
print(len(channel)) 
CNN
Cnn
0
3

Useful functions

movies = ['Avatar', 'Titanic', 'Alien']
movies.append('Avengers')
movies.insert(2, 'Terminator')
print(movies[3])
Alien
ages = [25, 33, 19]
sorted(ages)
sorted(ages, reverse = True)

# sort strings
players = ["Zoe", "Liam", "Emma", "Noah", "Olivia"]
srt_players = sorted(players)
print(srt_players)
['Emma', 'Liam', 'Noah', 'Olivia', 'Zoe']

Data structure

Mutable Ordered Indexing Duplicates
Lists Yes Yes Yes Yes
Tuples - Yes Yes Yes
Sets Yes - - -

Lists

Square brackets

list1 = ['tea', 'jam', 'scone']
list1

# different types of data can be mixed
list2 = ['tea', 20, True]
list2
['tea', 20, True]

Index starts from 0

list1[0]
'tea'

Lists are mutable, you can change the values in the list after it’s created.

list1[0] = 'milk'
list1
['milk', 'jam', 'scone']

Index can also be used on a string. However strings are immutable: we can not replace a character with another.

string = 'milk'
string[3] # prints the 4th character
'k'

Slicing

The stopping index is exclusive: [0:2] prints out the 1st and 2nd element.

animals =["cat", "dog", "bird", "cow"]
print(animals[0:2]) # excludes 0, takes 1st and 2nd
print(animals[1:3]) # excludes 1, takes 2nd and 3rd
['cat', 'dog']
['dog', 'bird']

The immediate two indices prints out only one value.

print(animals[0:1]) # 1st
print(animals[2:3]) # 3rd
['cat']
['bird']

A easier way to remember this for [a:b], start counting from [a+1:b]. Example: [3:5] becomes the 4th and 5th; [2:3] becomes 3rd and 3rd - just the 3rd.

Ignoring the starting index or stopping index

cart = ['lamp', 'candles', 'chair', 'carpet']
print(cart[:2]) # stopping at 2nd
print(cart[1:]) # starting at 2nd
['lamp', 'candles']
['candles', 'chair', 'carpet']

Negative indexing

print(cart[-1]) # last one
print(cart[-3:]) # last 3
carpet
['candles', 'chair', 'carpet']
cart = ['lamp', 'candles', 'chair', 'carpet']
print(cart[1:-1])
['candles', 'chair']
x = 15
x += 5
print(x)

prices = [15, 19, 24, 8, 5]
for i in prices:
  i += 5
  print(i)
20
20
24
29
13
10

List comprehension

Syntax: <variable> = [<expression> for <item> in <iterable>]

nums = []
for x in range(1,11):
  nums.append(x)
print(nums)

# alternatively,
nums = [x for x in range(1, 11)]
nums
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

<variable> = [<expression> for <item> in <iterable>]

nums = [x*2 for x in range(10)]
nums

tags = ['travel', 'vacation']
hashtags = ['#' + x for x in tags]
hashtags

# capitalise
Tags = [x.capitalize() for x in tags]
Tags
['Travel', 'Vacation']

Can combine conditions too

users = ["Brandon", "Emma", "Brian", "Sophia", "Bella", "Ethan", "Ava", "Benjamin", "Mia", "Chloe"]
group = [x for x in users if x[0] == "B"]
print(group)
['Brandon', 'Brian', 'Bella', 'Benjamin']

Tuples ()

Use parentheses. Tuples are immutable, they are useful when data shouldn’t be accidentally modified. Therefore you can not use append functions on tuples.

b_date = (21, 'May', 2004)
b_date[1]
'May'
scores = (7, 9, 9, 8, 9)
print('# of 7:', scores.count(7))
print('# of 9:', scores.count(9))
# of 7: 1
# of 9: 3

Unpacking in tuple. The length needs to be matched; however if you want to deal with unknown number of elements, can use *. After unpacking, the elements becomes a list [].

grades = (76, 81, 96)
math, history, art = grades
print(math)
math, *others = grades
print(others)
76
[81, 96]

Sets {}

With curly brackets. Sets are unordered so does not support indexing or slicing.

guests = {"Mery", "Anna", "Jonathan"}
print(guests)
# print(guests[0]) #error
{'Jonathan', 'Anna', 'Mery'}

Sets can not have duplicates, and duplicates are automatically ignored

friends = {'Anna', 'Mery', 'Mery', 'Jonathan'}
print(friends)
{'Jonathan', 'Anna', 'Mery'}

Sets are mutable, so you can add and remove items. However, append does not work on sets since they are unordered.

guests = {"Mery", "Anna", "Jonathan"}
guests.add('Robert')
guests.remove('Mery')
print(guests)
{'Jonathan', 'Anna', 'Robert'}

To clear the set,

guests.clear()
print(guests)
set()

To join sets, use set1.union(set2). This ignores the duplicates. To find the element only in set 1 but not set 2, use set1.difference(set2)

set1 = {'apple', 'banana'}
set2 = {'banana', 'cherry'}
combined_set = set1.union(set2)
print(combined_set)
unique_1 = set1.difference(set2)
unique_2 = set2.difference(set1)
print([unique_1, unique_2]) # print toether
{'banana', 'cherry', 'apple'}
[{'apple'}, {'cherry'}]

Dictionary

The values can be of any type, including a list. The keys has to be immutable.

product = {
  'name': 'pen',
  'is_red':True,
  'price': 79
}
print(product)
{'name': 'pen', 'is_red': True, 'price': 79}

When the key is a string, it needs to go with quotation.

dancer = {
  'name' : 'maria',
  'points' : [9, 10, 7]
}
dancer
{'name': 'maria', 'points': [9, 10, 7]}

The key has to be unique. If duplicate, the values will be overwritten.

contact = {
  'name' : 'maria',
  'company': 'Google',
  'company': 'facebook'
}
contact
{'name': 'maria', 'company': 'facebook'}

To access the values, use ['key']

contact['company']
contact.get('company')
contact.get('baba', 'puff') # if baba does not exist, returns puff 
'puff'
contact.keys()
contact.values()
contact.items() # all pairs
dict_items([('name', 'maria'), ('company', 'facebook')])

Change values for dictionary with update()

user = {
  'name': 'Albert',
  'age': 29
}
user.update({'age': 30})
print(user['age'])
print(user.items())
30
dict_items([('name', 'Albert'), ('age', 30)])

pop() removes item with specified key name

car = {
  'brand': 'Ford',
  'model': 'mustang',
  'color': 'red'
}
# remove color key
car.pop('color')
print(car)
# check if values are in the dictionary
'mustang' in car.values()
{'brand': 'Ford', 'model': 'mustang'}
True

Combined with loops, it returns the keys (not values)

car = {
  'brand': 'Ford',
  'model': 'mustang',
  'color': 'red'
}
for i in car:
  print(i)

for i in car.values(): # this prints the values
  print(i)

for i in car.items(): # this prints the pairs
  print(i)
brand
model
color
Ford
mustang
red
('brand', 'Ford')
('model', 'mustang')
('color', 'red')

Unpack dictionary with **

dict1 = {'a': 1, 'b': 2}
dict2 = {'c': 3, 'd': 4}
merged_dict = {**dict1, **dict2}
print(merged_dict)
{'a': 1, 'b': 2, 'c': 3, 'd': 4}