Lesson 7: Importing Libraries

One of the most common things you will do in Python machine learning projects is import the libraries and use their functions. Let's learn how to do that.


Import Libraries

Here's a part of a typical Python script for machine learning:

main.py

1import pandas as pd
2# ... more libraries imported here
3 
4df = pd.read_csv('salaries.csv')
5print(df.head())
6 
7# ... more code

So, generally, for smaller one-time scripts, you may continue putting all the logic in one .py file and import the libraries on top.

The general syntax options for import are these:

1import module0
2import module1 as alias1
3from package1 import module2
4from package1.module3 import function1
5from package2 import class1
6from package2.subpackage1.module5 import function2

Now let's look at another example and import random and math modules:

main.py

1import random
2import math
3 
4print(random.randint(1, 10))
5print(math.sqrt(64))

Here, we use the randint() and sqrt() functions provided by those modules. Having a single import statement imports all functions under that module's namespace. Like in PHP, you can make aliases following the as statement:

main.py

1import random as rd
2import math as m
3 
4print(rd.randint(1, 10))
5print(m.sqrt(64))

You often do not need to import whole modules; you can only import those functions or constants you use.

1from math import sqrt, pi
2 
3print(pi)
4print(sqrt(pi))
5# 3.141592653589793
6# 1.7724538509055159

Some libraries are not part of the core Python, so you will need to install them on your computer first.

For that, you would need to install pip, which is a package installer for Python.

Then, you can install various libraries according to their official instructions. For example, pandas library has this installation:

1pip install pandas

More Complex Project Structure

Let's look into one of the ways to structure a more complex project. The directory tree might be as follows:

1my_project/
2|-- my_package/
3| |-- __init__.py
4| |-- module1.py
5| |-- module2.py
6|-- tests/
7| |-- __init__.py
8| |-- test_module1.py
9| |-- test_module2.py
10|-- docs/
11|-- data/
12|-- scripts/
13|-- my_single_file_module.py
14|-- requirements.txt
15|-- setup.py
16|-- README.md
  • my_package/: This package is for your main code base. You can put your code into modules and submodules within this package. The __init__.py files indicate that the directories are Python packages. Empty __init__.py files are considered standard and good practice if modules and submodules do not share the code.
  • my_single_file_module.py: If your module consists of just one file, you can place it into the root of your repository.
  • tests/: This directory contains your test modules. Each test file typically corresponds to a module in your my_package/. You can use testing frameworks like pytest or unittest for your tests.
  • docs/: Here, you might include API documentation, usage guides, and other relevant information.
  • data/: This directory is often used to store data files that your program needs, such as CSV files or databases, etc.
  • scripts/: Standalone scripts that are not part of your package but are helpful for your project; you can place them here.
  • requirements.txt: This file lists the Python packages and their versions required to run your project. You can generate this file using pip freeze > requirements.txt.
  • setup.py: This file contains information about your project, and it is used by tools like pip and setuptools for packaging and distribution.

This example is more of a guideline than a rule.


Create And Import Your First Module

Create a single file module, salary_generator.py, in your project root directory.

salary_generator.py

1import random
2 
3 
4def generate_salary_by_experience(years_experience):
5 base_salary = 200 * years_experience + 2000
6 return base_salary + random.randint(1, 199)

Now, in your main.py file, you can import and use this function.

1from salary_generator import generate_salary_by_experience
2 
3if __name__ == '__main__':
4 salary = generate_salary_by_experience(5)
5 
6 print(f'Your salary is {salary}')

When you execute main.py, you should see the result:

1Your salary is 3051

This is how easy it is.

Why Do I Need if __name__ == '__main__':?

Let's see in practice what happens if you add this line to your salary_generator.py file:

1import random
2 
3 
4def generate_salary_by_experience(years_experience):
5 base_salary = 200 * years_experience + 2000
6 return base_salary + random.randint(1, 199)
7 
8 
9print('Salary Generator v0.1')

Then execute the main.py file and see the output:

1Salary Generator v0.1
2Your salary is 3138

The code in salary_generator was executed. It is okay if you run salary_generator as a standalone module, but it is not welcome when you import it into other modules.

The special variable __name__ is used to determine whether a Python script is being run as the main program or if it is being imported as a module into another script. The if __name__ == '__main__' condition is a common way to check if that's true.

Any code executed as the main program should be put under the condition.

salary_generator.py

1import random
2 
3 
4def generate_salary_by_experience(years_experience):
5 base_salary = 200 * years_experience + 2000
6 return base_salary + random.randint(1, 199)
7 
8 
9if __name__ == '__main__':
10 # Code here will only run if this script is executed as the main program
11 # and not if it is imported as a module into another script.
12 print('Salary Generator v0.1')

The same reasoning applies to the main.py file:

main.py

1from salary_generator import generate_salary_by_experience
2 
3if __name__ == '__main__':
4 salary = generate_salary_by_experience(5)
5 
6 print(f'Your salary is {salary}')

Modules often are intended to be import-only, so they omit that condition.