Overview
Teaching: 15 min
Exercises: 15 minQuestions
How can I use software that other people have written?
How can I find out what that software does?
Objectives
Explain what software libraries are and why programmers create and use them.
Write programs that import and use libraries from Python’s standard package.
Find and read documentation for standard libraries interactively (in the interpreter) and online.
import
to load the contents of a package into a program’s memory.package_name.sub_module_name
.
.
to mean “part of”.import math
math.pi
3.141592653589793
math.cos(math.pi)
-1.0
math.cos(pi)
won’t work: the reference to pi
doesn’t somehow “inherit” the function’s reference to math
.help
to find out more about a package’s contents.help(math)
Help on module math:
NAME
math
MODULE REFERENCE
http://docs.python.org/3.5/library/math
The following documentation is automatically generated from the Python
source files. It may be incomplete, incorrect or include features that
are considered implementation detail and may vary between Python
implementations. When in doubt, consult the module reference at the
location listed above.
DESCRIPTION
This module is always available. It provides access to the
mathematical functions defined by the C standard.
FUNCTIONS
acos(...)
acos(x)
Return the arc cosine (measured in radians) of x.
⋮ ⋮ ⋮
from...import...
to load only specific items from a package.from math import cos, pi
cos(pi)
-1.0
import...as...
to give a package a short alias while importing it.import math as m
m.cos(m.pi)
-1.0
Since we have downloaded a lot of files as part of the metasearch repository we would like to use a package that will allow us to work with these files more efficiently. The pathlib package is ideal for this. Let’s import from this to start working with paths:
from pathlib import Path
metasearch_dir = Path('metasearch')
metasearch_dir
metasearch
absolute()
method can be used to convert a relative path to an
absolute path:metasearch_dir.absolute()
/home/this_user/reproducibility_course/metasearch
In addition to some convenient methods to work with filesystem paths the
metasearch_dir object has more useful things to help us when we work with
paths. More specifically it has what are termed
https://docs.python.org/3/howto/descriptor.html. These are accessed in a
similar manner to the methods but without parentheses. As an example we’ll use
the parent
data descriptor:
metasearch_dir.parent
PosixPath('.')
In this case the descriptor is not particularly useful because metasearch_dir
is specified as a relative path. We can use the absolute()
method to fix this:
metasearch_dir.absolute().parent
PosixPath('/home/this_user/reproducibility_course')
Another example of a useful Path data descriptor is the name
attribute:
metasearch_dir.name
'metasearch'
An error occurs if we attempt to call a data descriptor as if it were a method:
metasearch_dir.name()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-328-b6f442f14303> in <module>()
----> 1 metasearch_dir.name()
TypeError: 'str' object is not callable
If we try to use a method as if it were a data descriptor nothing useful is displayed:
metasearch_dir.absolute
<bound method Path.absolute of PosixPath('metasearch')>
If we keep track of what is a method and what is a data-descriptor it helps us to avoid these mistakes. We can view the data-descriptors at the end of the help for the Path class from the pathlib package:
help(Path)
⋮ ⋮ ⋮
| ----------------------------------------------------------------------
| Data descriptors inherited from PurePath:
|
| anchor
| The concatenation of the drive and root, or ''.
|
| drive
| The drive prefix (letter or UNC path), if any.
|
| name
| The final path component, if any.
|
| parent
| The logical parent of the path.
|
| parents
| A sequence of this path's logical parents.
|
| parts
| An object providing sequence-like access to the
| components in the filesystem path.
|
| root
| The root of the path, if any.
|
| stem
| The final path component, minus its last suffix.
|
| suffix
| The final component's last suffix, if any.
|
| suffixes
| A list of the final component's suffixes, if any.
Given an existing path object, it is easy to build a new one with minor
differences. For example, we can easily construct a path to a different file
in the same directory by using the with_name()
method.
crawler_dir = metasearch_dir.joinpath('crawler')
py = crawler_dir.with_name('transform.py')
py
metasearch/transform.py
We can use the
with_suffix()
method to create a new path that replaces the file name’s
extension with that of a different filetype.
ipynb = py.with_suffix('.ipynb')
ipynb
metasearch/transform.ipynb
resolve()
method for normalizing a
path by looking at the filesystem for directories and producing the
absolute path referred to by a name. Symbolic links will be resolved and on
Windows slashes will be replaced with backslashes. Our path does not
contain symbolic links and the resolved path is identical to the absolute
path:metasearch_dir.resolve()
/home/this_user/reproducibility_course/metasearch
We can build up a path by providing directories to the joinpath(). We pass each path segment as a separate argument:
metadata_dir = metasearch_dir.joinpath('crawler', 'metadata')
metadata_dir
metasearch/crawler/metadata
A more convenient way of passing a list as multiple arguments to a function is to use the star operator to “unpack” them:
subdirs = ['crawler', 'metadata']
metadata_dir = metasearch_dir.joinpath(*subdirs)
metadata_dir
metasearch/crawler/metadata
While it’s convenient to construct these paths we should remember that we don’t necessarily know whether the path that we have created exists. In order to check this we can use the exists() method:
metadata_dir.exists()
True
[x for x in crawler.iterdir()]
[PosixPath('metasearch/crawler/Load.ipynb'),
PosixPath('metasearch/crawler/brain-development'),
PosixPath('metasearch/crawler/brainbox-csv'),
PosixPath('metasearch/crawler/clean-csv'),
PosixPath('metasearch/crawler/dataverse'),
PosixPath('metasearch/crawler/example_repository'),
PosixPath('metasearch/crawler/fcp-indi.gz'),
PosixPath('metasearch/crawler/fcp-indi'),
PosixPath('metasearch/crawler/fcp-info.ipynb'),
PosixPath('metasearch/crawler/ixi-crawl.ipynb'),
PosixPath('metasearch/crawler/metadata'),
PosixPath('metasearch/crawler/process-clean-csv.ipynb'),
PosixPath('metasearch/crawler/process-csv.ipynb'),
PosixPath('metasearch/crawler/transform.ipynb')]
iterdir()
raises
NotADirectoryError.We can be more selective in the contents that we return if we use the glob()
method. This method allows us to find only files matching a pattern. We’ll try
to find files that end in “csv”:
[f for f in crawler_dir.glob('*.csv')]
[]
We didn’t find any csv files. There are some ipython notebooks in the current directory though:
[x for x in crawler_dir.glob('*.ipynb')]
[PosixPath('metasearch/crawler/Load.ipynb'),
PosixPath('metasearch/crawler/fcp-info.ipynb'),
PosixPath('metasearch/crawler/ixi-crawl.ipynb'),
PosixPath('metasearch/crawler/process-clean-csv.ipynb'),
PosixPath('metasearch/crawler/process-csv.ipynb'),
PosixPath('metasearch/crawler/transform.ipynb')]
**
. Multiple wildcards can be passed to glob methods.[f for f in crawler_dir.glob('**/*-*.ipynb')]
[PosixPath('metasearch/crawler/fcp-info.ipynb'),
PosixPath('metasearch/crawler/ixi-crawl.ipynb'),
PosixPath('metasearch/crawler/process-clean-csv.ipynb'),
PosixPath('metasearch/crawler/process-csv.ipynb'),
PosixPath('metasearch/crawler/fcp-indi/fcp-indi-extractor.ipynb')]
We will be able to interpret the content from csv files more readily so we’ll recursively search for csv files in the crawler directory:
csvs = [f for f in crawler_dir.glob('**/*-*.csv')]
csvs
[PosixPath('metasearch/crawler/brainbox-csv/all-mris.csv'),
PosixPath('metasearch/crawler/clean-csv/ABIDE_Initiative-clean.csv'),
PosixPath('metasearch/crawler/clean-csv/ACPI-clean.csv'),
PosixPath('metasearch/crawler/clean-csv/ADHD200-clean.csv'),
PosixPath('metasearch/crawler/clean-csv/BrainGenomicsSuperstructProject-clean.csv'),
PosixPath('metasearch/crawler/clean-csv/CORR-clean.csv'),
PosixPath('metasearch/crawler/clean-csv/HypnosisBarrios-clean.csv'),
PosixPath('metasearch/crawler/clean-csv/IXI-clean.csv'),
PosixPath('metasearch/crawler/clean-csv/RocklandSample-clean.csv'),
PosixPath('metasearch/crawler/clean-csv/all-session.csv'),
PosixPath('metasearch/crawler/fcp-indi/rocklandsample/nki-rs_lite_r4_phenotypic_v1.csv'),
PosixPath('metasearch/crawler/fcp-indi/rocklandsample/nki-rs_lite_r6_phenotypic_v1.csv'),
PosixPath('metasearch/crawler/fcp-indi/rocklandsample/nki-rs_lite_r7_phenotypic_v1.csv'),
PosixPath('metasearch/crawler/fcp-indi/rocklandsample/nki-rs_lite_r8_phenotypic_v1.csv')]
read_bytes()
or read_text()
.test = csvs[0]
text_output = test.read_text()
# Length of output:
len(text_output)
# The first 100 characters:
text_output[0:100]
Length of output: 984535
https://s3.amazonaws.com/fcp-indi/data/Projects/ABIDE_Initiative/Outputs/freesurfer/5.1/Pitt_0050002
We can use the open()
method if we wish to have more control over the reading
process. This can be helpful if we don’t wish to read the entire contents of a
file into memory. When using this method we retain a file handle for subsequent
reading from or writing to the file:
with test.open('r', encoding='utf-8') as file_handle:
print(file_handle.readlines(200))
['https://s3.amazonaws.com/fcp-indi/data/Projects/ABIDE_Initiative/Outputs/freesurfer/5.1/Pitt_0050002/mri/T1.mgz,50002,abide_initiative\n', 'https://s3.amazonaws.com/fcp-indi/data/Projects/ABIDE_Initiative/Outputs/freesurfer/5.1/Pitt_0050003/mri/T1.mgz,50003,abide_initiative\n']
write_text()
or write_bytes()
. The
write_text()
method will create the file if it does not exist and
overwrite exiting content if it does.f = metasearch_dir.joinpath('example.txt')
f.write_text('This is the content')
We’ll check that this worked by displaying the contents of all files with names starting with “ex” in the metasearch directory:
%cat {metasearch_dir}/ex*
This is the content
Getting help when using file IO operations
The pathlib package provides convenient access to some of python’s built-in functions for reading and writing files. There is lots of useful help for these functions.
As an example try:
help(open)
The unlink() method of the Path class allows us to delete files from our filesystem:
# exists before removing:
f.exists()
f.unlink()
# exists after removing:
f.exists()
File Properties
- Detailed information about a file can be accessed using the methods stat().
stat_info = metasearch_dir.stat() import time print('{}:'.format(stat_info)) print(' Size (kB) :', stat_info.st_size) print(' Created :', time.ctime(stat_info.st_ctime)) print(' Last modified:', time.ctime(stat_info.st_mtime)) print(' Last accessed:', time.ctime(stat_info.st_atime))
os.stat_result(st_mode=16872, st_ino=173483174, st_dev=20, st_nlink=7, st_uid=37163, st_gid=37163, st_size=4096, st_atime=1488916170, st_mtime=1488907986, st_ctime=1488907986): Size (kB) : 4096 Created : Tue Mar 7 12:33:06 2017 Last modified: Tue Mar 7 12:33:06 2017 Last accessed: Tue Mar 7 14:49:30 2017
Locating the Right Package
You want to select a random character from a string:
bases = 'ACTTGCTTGAC'
- What standard library would you most expect to help?
- Which function would you select from that library? Are there alternatives?
When Is Help Available?
When a colleague of yours types
help(math)
, Python reports an error:NameError: name 'math' is not defined
What has your colleague forgotten to do?
Importing Specific Items
- Fill in the blanks so that the program below prints
90.0
.- Do you find this easier to read than preceding versions?
- Why would’t programmers always use this form of
import
?____ math import ____, ____ angle = degrees(pi / 2) print(angle)
Path construction
Construct a Path object with the following list and the metasearch_dir as the parent directory. Does this path exist in the metasearch repository? If it such a path didn’t exist how would we create it?
path_elements = [‘csv_repository’, ‘brains.csv’]
Solution
constructed_path = metasearch_dir.joinpath(*path_elements) print('The path constructed is ', constructed_path) print(constructed_path.exists())
The path constructed is metasearch/csv_repository/brains.csv False
To construct this path we can use the method
touch()
; however since the parent directory also doesn’t exist we should create that first:constucted_path.parent.mkdir() constructed_path.touch() print(constructed_path.exists())
True
Find the movies
Find movies.csv in the download metasearch repository and create a Path object called movies with this path as its name. Check that you have done this correctly by confirming that the methods of the Path object are available for your movies variable.
Solution
We can search for any movies.csv file by using the glob method of our metasearch_dir Path object. We need to be careful to extract the Path object returned. We can use a list comprehension to avoid working with the generator iterable returned by
glob()
and we can index into the resulting list to return the desired Path object:movies = [csv for csv in metasearch_dir.glob()][0] print(movies)
metasearch/docs/lib/parcoords/examples/data/movies.csv
What’s in a csv?
Edit the code below to read the first 2 lines of all.csv located somewhere in the metasearch file tree. Write the output text to a file with the path “metasearch/my_dir/exercise.txt” using the Path class methods (so do not use IPython magic commands for filesystem manipulation).
nlines = 5 output = '' with path_object.open('r') as f: for ii in range(nlines): output += 'csv-file "{!s}" , line {!s} : {!s}'.format(path.stem, ii+1,f.readline()) print(output)
Solution
We’ll first create a Path object with the name of our target file:
csv_file = [csv for csv in metasearch_dir.glob('**/all.csv')][0] print(csv_file.exists())
True
Then we will create a Path object the name of our output file. Since the parent directory of our target file does not exist we will create that too:
target_file = Path('my_dir/exercise.txt') target_file.parent.mkdir()
Finally we will read the contents of our csv_file and write the contents to our target file:
movies = [csv for csv in metasearch_dir.glob()][0] print(movies)
metasearch/docs/lib/parcoords/examples/data/movies.csv
Key Points
Most of the power of a programming language is in its libraries.
A program must import a package in order to use it.
Use
help
to find out more about a package’s contents.Import specific items from a package to shorten programs.
Create an alias for a package when importing it to shorten programs.