Skip to content

API documentation

calculate_pcr_product(sequence: FastaSequence, forward_primer: FastaSequence, reverse_primer: FastaSequence, min_product_length: Union[int, None] = None, max_product_length: Union[int, None] = None, header: bool = True, cols: str = 'all', output_file: Union[bool, str] = False) -> str

Returns the products amplified by a pair of primers against a single sequence.

This function is meant to be used by find_pcr_product, but if you'd like to use this separately, you must supply FastaSequence objects. A FastaSequence is just a convenient object to couple a header with a DNA string. For example, >>> forward_primer = FastaSequence("test_forward_primer", "ACTG") >>> reverse_primer = FastaSequence("test_reverse_primer", "ATTA") >>> target_sequence = FastaSequence("target_sequence", "ATGCTGATGCATGCTA")

Inputs

FastaSequence

The fasta sequence to test for amplification.

FastaSequence

The forward primer to use.

FastaSequence

The forward primer to use. Note that you should supply this

None | int

If provided, only return those products whose length are greater than or equal to this number. Defaults to None, which returns all products found.

None | int

If provided, only return those products whose length are less than or equal to this number. Defaults to None, which returns all products found.

bool | str

Whether or not to print a header on the results. Defaults to True. False will not print out the header.

str

Which columns to print out. Defaults to "all," which prints out all the columns. A string can be supplied to only output the strings of interest. For example, cols="fpri rpri pname" will only output the names of the forward primer, reverse primer, and the target sequence when a target is found. Available options are: fpri - the name of the forward primer rpri - the name of the reverse primer start - the start location of the product in the target sequence end - the end location of the product in the target sequence length - the length of the product pname - the name of the sequence in which the target was found pseq - the nucleotide sequnce of the amplified product

bool | str

The file to write the results out to. Defaults to False, which will not print anything out. Providing a string will create that input file at that location. If set to True without providing a string, the output file will be of the form .txt

Outputs

A tab-separated string containing all of the products amplified by the primers contained in the primer file.

The fields are
  1. Forward primer name
  2. Reverse primer name
  3. Start position of the product in the target sequence
  4. End position of the product in the target sequence
  5. Product length
  6. The product
Source code in ispcr/__init__.py
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
def calculate_pcr_product(
    sequence: FastaSequence,
    forward_primer: FastaSequence,
    reverse_primer: FastaSequence,
    min_product_length: Union[int, None] = None,
    max_product_length: Union[int, None] = None,
    header: bool = True,
    cols: str = "all",
    output_file: Union[bool, str] = False,
) -> str:
    """Returns the products amplified by a pair of primers against a single sequence.

    This function is meant to be used by find_pcr_product, but if you'd like to use this separately,
    you must supply FastaSequence objects. A FastaSequence is just a convenient object to couple a header
    with a DNA string. For example,
        >>> forward_primer = FastaSequence("test_forward_primer", "ACTG")
        >>> reverse_primer = FastaSequence("test_reverse_primer", "ATTA")
        >>> target_sequence = FastaSequence("target_sequence", "ATGCTGATGCATGCTA")

    Inputs
    ------
    sequence: FastaSequence
        The fasta sequence to test for amplification.

    forward_primer: FastaSequence
        The forward primer to use.

    reverse_primer: FastaSequence
        The forward primer to use. Note that you should supply this

    min_product_length: None | int
        If provided, only return those products whose length are greater than or equal to this number.
        Defaults to None, which returns all products found.

    max_product_length: None | int
        If provided, only return those products whose length are less than or equal to this number.
        Defaults to None, which returns all products found.

    header: bool | str
        Whether or not to print a header on the results. Defaults to True. False will not print out the header.

    cols: str
        Which columns to print out. Defaults to "all," which prints out all the columns. A string can be supplied to
        only output the strings of interest. For example, cols="fpri rpri pname" will only output the names of the forward
        primer, reverse primer, and the target sequence when a target is found.
        Available options are:
            fpri - the name of the forward primer
            rpri - the name of the reverse primer
            start - the start location of the product in the target sequence
            end - the end location of the product in the target sequence
            length - the length of the product
            pname - the name of the sequence in which the target was found
            pseq - the nucleotide sequnce of the amplified product

    output_file: bool | str
        The file to write the results out to. Defaults to False, which will not print anything out. Providing a string
        will create that input file at that location. If set to True without providing a string, the output file
        will be of the form <DD-MM-YYYY_HH:MM:SS>.txt



    Outputs
    -------
    A tab-separated string containing all of the products amplified by the primers contained in the primer file.
    The fields are:
        1. Forward primer name
        2. Reverse primer name
        3. Start position of the product in the target sequence
        4. End position of the product in the target sequence
        5. Product length
        6. The product

    """

    products = []

    # Check cols string
    selected_column_indices = parse_selected_cols(cols)

    if header is True:
        products.append(filter_output_line(BASE_HEADER, selected_column_indices))

    forward_matches = [
        match.start()
        for match in re.finditer(forward_primer.sequence, sequence.sequence)
    ]

    for forward_match in forward_matches:
        tempseq = sequence[forward_match:]
        reverse_matches = [
            match.start()
            for match in re.finditer(
                reverse_complement(reverse_primer.sequence), tempseq
            )
        ]

        for reverse_match in reverse_matches:
            product = tempseq[: reverse_match + len(reverse_primer)]
            start = forward_match
            end = (
                forward_match + reverse_match + len(reverse_primer)
            )  # This is the end in the original sequence
            product_length = len(product)
            # if min_product_length is not None and product_length < min_product_length:
            #     continue
            if not desired_product_size(
                product_length, min_product_length, max_product_length
            ):
                continue

            product_line = f"{forward_primer.header}\t{reverse_primer.header}\t{start}\t{end}\t{product_length}\t{sequence.header}\t{product}"
            products.append(filter_output_line(product_line, selected_column_indices))

    results = "\n".join(products)

    if isinstance(output_file, str) is True:
        with open(output_file, "w") as fout:
            fout.write(results)

    return results

get_pcr_products(primer_file: str, sequence_file: str, min_product_length: Union[int, None] = None, max_product_length: Union[int, None] = None, header: Union[bool, str] = True, cols: str = 'all', output_file: Union[bool, str] = False) -> str

Returns all the products amplified by a set of primers in all sequences in a fasta file.

Inputs

str

The path to the fasta file containing the primers to be tested. Currently, this primer file is expected to only contain two sequences, with the forward sequence appearing first. For an example: >test_1.f AGTCA >test_2.r TTATGC

str

The path to the fasta file containing the sequences to test the primers against.

None | int

If provided, only return those products whose length are greater than or equal to this number. Defaults to None, which returns all products found.

None | int

If provided, only return those products whose length are less than or equal to this number. Defaults to None, which returns all products found.

bool

Whether or not to print a header on the results. Defaults to True. False will not print out the header.

str

Which columns to print out. Defaults to "all," which prints out all the columns. A string can be supplied to only output the strings of interest. For example, cols="fpri rpri pname" will only output the names of the forward primer, reverse primer, and the target sequence when a target is found. Available options are: fpri - the name of the forward primer rpri - the name of the reverse primer start - the start location of the product in the target sequence end - the end location of the product in the target sequence length - the length of the product pname - the name of the sequence in which the target was found pseq - the nucleotide sequnce of the amplified product

bool | str

The file to write the results out to. Defaults to False, which will not print anything out. Providing a string will create that input file at that location. If set to True without providing a string, the output file will be of the form .txt

Outputs

A tab-separated string containing all of the products amplified by the primers contained in the primer file.

The fields are
  1. Forward primer name
  2. Reverse primer name
  3. Start position of the product in the target sequence
  4. End position of the product in the target sequence
  5. Product length
  6. The product
Source code in ispcr/__init__.py
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
def get_pcr_products(
    primer_file: str,
    sequence_file: str,
    min_product_length: Union[int, None] = None,
    max_product_length: Union[int, None] = None,
    header: Union[bool, str] = True,
    cols: str = "all",
    output_file: Union[bool, str] = False,
) -> str:
    """Returns all the products amplified by a set of primers in all sequences in a fasta file.

    Inputs
    ------
    primer_file: str
        The path to the fasta file containing the primers to be tested. Currently, this primer
        file is expected to only contain two sequences, with the forward sequence appearing first.
        For an example:
            >test_1.f
            AGTCA
            >test_2.r
            TTATGC

    sequence_file: str
        The path to the fasta file containing the sequences to test the primers against.

    min_product_length: None | int
        If provided, only return those products whose length are greater than or equal to this number.
        Defaults to None, which returns all products found.

    max_product_length: None | int
        If provided, only return those products whose length are less than or equal to this number.
        Defaults to None, which returns all products found.

    header: bool
        Whether or not to print a header on the results. Defaults to True. False will not print out the header.

    cols: str
        Which columns to print out. Defaults to "all," which prints out all the columns. A string can be supplied to
        only output the strings of interest. For example, cols="fpri rpri pname" will only output the names of the forward
        primer, reverse primer, and the target sequence when a target is found.
        Available options are:
            fpri - the name of the forward primer
            rpri - the name of the reverse primer
            start - the start location of the product in the target sequence
            end - the end location of the product in the target sequence
            length - the length of the product
            pname - the name of the sequence in which the target was found
            pseq - the nucleotide sequnce of the amplified product

    output_file: bool | str
        The file to write the results out to. Defaults to False, which will not print anything out. Providing a string
        will create that input file at that location. If set to True without providing a string, the output file
        will be of the form <DD-MM-YYYY_HH:MM:SS>.txt


    Outputs
    -------
    A tab-separated string containing all of the products amplified by the primers contained in the primer file.
    The fields are:
        1. Forward primer name
        2. Reverse primer name
        3. Start position of the product in the target sequence
        4. End position of the product in the target sequence
        5. Product length
        6. The product

    """

    products = []

    # If anything gets passed for the header, it gets handled here instead of in
    # calculate_pcr_product.

    selected_column_indices = parse_selected_cols(cols)

    if header is True:
        products.append(filter_output_line(BASE_HEADER, selected_column_indices))

    primers = read_sequences_from_file(primer_file)
    forward_primer, reverse_primer = primers

    sequences = read_sequences_from_file(sequence_file)

    for sequence in sequences:
        new_products = calculate_pcr_product(
            sequence=sequence,
            forward_primer=forward_primer,
            reverse_primer=reverse_primer,
            min_product_length=min_product_length,
            max_product_length=max_product_length,
            header=False,
            cols=cols,
        )
        if new_products:
            products.append(new_products)

    results = "\n".join(products)

    if isinstance(output_file, str) is True:
        with open(output_file, "w") as fout:
            fout.write(results)

    return results

FastaSequence

Generic fasta sequence class.

utils

This module contains various utilities used during in silico PCR.

desired_product_size(potential_product_length: int, min_product_length: Union[int, None] = None, max_product_length: Union[int, None] = None) -> bool

Determines if a potential product's size is in the user's desired product range.

Inputs potential_product_length - int The length of the potential product

int | None

The minimum product size the user will accept. If None, there is no lower limit.

int | None

The maximum product size the user will accept. If None, there is no lower limit.

Outputs A boolean for whether the product length is between the min and max product length.

Example desired_product_size(100, 75, 125) True desired_product_size(200, 75, 125) False

Source code in ispcr/utils.py
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
def desired_product_size(
    potential_product_length: int,
    min_product_length: Union[int, None] = None,
    max_product_length: Union[int, None] = None,
) -> bool:
    """Determines if a potential product's size is in the user's desired product range.

    Inputs
    potential_product_length - int
        The length of the potential product
    min_product_length: int | None
        The minimum product size the user will accept. If None, there is no lower limit.
    max_product_length: int | None
        The maximum product size the user will accept. If None, there is no lower limit.

    Outputs
    A boolean for whether the product length is between the min and max product length.

    Example
    desired_product_size(100, 75, 125)
    True
    desired_product_size(200, 75, 125)
    False
    """

    if min_product_length is None:
        if max_product_length is None:
            return True
        else:
            return potential_product_length <= max_product_length
    else:
        if max_product_length is None:
            return min_product_length <= potential_product_length
        else:
            if max_product_length < min_product_length:
                raise ValueError(
                    "min_product_length cannot be larger than max_product_length"
                )
            return min_product_length <= potential_product_length <= max_product_length

filter_output_line(output_line: str, column_indices: List[int]) -> str

Filters a single line of isPCR results based on selected column indices.

Source code in ispcr/utils.py
168
169
170
171
172
173
174
175
176
177
178
179
def filter_output_line(output_line: str, column_indices: List[int]) -> str:
    """
    Filters a single line of isPCR results based on selected column indices.
    """

    if not output_line.split():
        return ""
    elif column_indices == list(range(7)):
        return output_line
    else:
        columns = output_line.split()
        return "\t".join([columns[i] for i in column_indices])

get_column_indices(header_string: str) -> List[int]

Returns column indices based on a column header string.

Source code in ispcr/utils.py
161
162
163
164
165
def get_column_indices(header_string: str) -> List[int]:
    """
    Returns column indices based on a column header string.
    """
    return [COLUMN_HEADERS[col] for col in header_string.split()]

is_valid_cols_string(header_string: str) -> bool

Internal helper to check if a column header string is valid.

Source code in ispcr/utils.py
147
148
149
150
151
152
153
154
155
156
157
158
def is_valid_cols_string(header_string: str) -> bool:
    """
    Internal helper to check if a column header string is valid.
    """
    if not header_string:
        return False

    for col_name in header_string.split():
        if col_name not in COLUMN_HEADERS:
            return False

    return True

parse_selected_cols(cols: str) -> List[int]

Returns a list of int indices based on a column header string.

Source code in ispcr/utils.py
182
183
184
185
186
187
188
189
190
191
192
193
def parse_selected_cols(cols: str) -> List[int]:
    """
    Returns a list of int indices based on a column header string.
    """
    if cols != "all":
        if not is_valid_cols_string(cols):
            raise InvalidColumnSelectionError("Invalid header string.")
        else:
            selected_column_indices = get_column_indices(cols)
    else:
        selected_column_indices = list(range(len(BASE_HEADER.split())))
    return selected_column_indices

read_fasta(fasta_file: TextIO) -> Iterator[FastaSequence]

An iterator for fasta files.

Inputs
------
fasta_file: TextIO
    An open file for reading

Outputs
-------
An iterator yielding the the sequence names and sequences from a fasta file

Example
-------
input_file = 'tests/test_data/sequences/met_r.fa.fasta'
with open(input_file) as fin:
for name, seq in read_fasta(fin):
    print(f'{name}

{seq}')

Source code in ispcr/utils.py
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
def read_fasta(fasta_file: TextIO) -> Iterator[FastaSequence]:
    """An iterator for fasta files.

    Inputs
    ------
    fasta_file: TextIO
        An open file for reading

    Outputs
    -------
    An iterator yielding the the sequence names and sequences from a fasta file

    Example
    -------
    input_file = 'tests/test_data/sequences/met_r.fa.fasta'
    with open(input_file) as fin:
    for name, seq in read_fasta(fin):
        print(f'{name}\n{seq}')

    """
    name = None
    seq: List[str] = []
    for line in fasta_file:
        line = line.rstrip()
        if line.startswith(">"):
            if name:
                header = name
                sequence = "".join(seq)
                yield FastaSequence(header, sequence)
            name, seq = line[1:], []
        else:
            seq.append(line)
    if name:
        header = name
        sequence = "".join(seq)
        yield FastaSequence(header, sequence)

read_sequences_from_file(primer_file: str) -> List[FastaSequence]

Reads a fasta file, converts the sequences to FastaSequences, and returns them in a list.

Source code in ispcr/utils.py
135
136
137
138
139
140
141
142
143
144
def read_sequences_from_file(primer_file: str) -> List[FastaSequence]:
    """
    Reads a fasta file, converts the sequences to FastaSequences, and returns them in a list.
    """
    sequences = []
    with open(primer_file) as fin:
        for fasta_sequence in read_fasta(fin):
            sequences.append(fasta_sequence)

    return sequences

reverse_complement(dna_string: str) -> str

Returns the reverse complement of a DNA string.

Inputs
str

A string representing a DNA sequence. Supported bases are A, C, G, and T.

Outputs

The reverse complement of dna_string.

Raises

KeyError Raised if there is a base in dna_string that is not one of ACGT.

Example

reverse_complement('GCTGA') 'TCAGC'

Source code in ispcr/utils.py
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
def reverse_complement(dna_string: str) -> str:
    """Returns the reverse complement of a DNA string.

    Inputs
    ------
    dna_string: str
        A string representing a DNA sequence. Supported bases are A, C, G, and T.

    Outputs
    -------
    The reverse complement of dna_string.

    Raises
    ------
    KeyError
        Raised if there is a base in dna_string that is not one of ACGT.

    Example
    -------
    >>> reverse_complement('GCTGA')
    'TCAGC'
    """

    complements = {"A": "T", "C": "G", "G": "C", "T": "A"}
    try:
        rev_seq = "".join([complements[s] for s in dna_string[::-1]])
    except KeyError as e:
        print(f"Base {e.args} not supported.")
        raise
    return rev_seq