/usr/share/doc/blast2/seedtop.html is in blast2 1:2.2.26.20120620-8.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 | <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<link rel="stylesheet" href="http://www.ncbi.nlm.nih.gov/corehtml/ncbi2.css" type="text/css" />
<style type="text/css">
body {background-color: #FFFFFF; color: #000000;}
div.c5 {margin-left: 2em;}
h3.c4 {font-weight: bold;}
p.c3 {font-size: 120%; font-weight: bold;}
h3.c2 {font-size: 120%; font-weight: bold;}
b.c1 {color: white;}
a.c1 {color: #FFFFFF;}
.breadcrumb {font-size: 90%; font-family: sans-serif; text-decoration: none; font-weight: bold; color: white; }
.breadcrumb a { text-decoration: none; color: white; }
.sidebar { font-size: 80%; font-family: arial, helvetica, sans-serif; text-decoration:none; background-color: #336699; }
.sidebar li {color: #ccccff;}
.sidebar a {color: #ffffff; text-decoration:none;}
.sidebar p {color: #ffcc66;}
.content { position: fixed; margin-left: 125; }
.footer {clear: both;}
.blast { font-family: sans-serif; }
.blast h3 { font-size: 14px; }
.blast a { text-decoration: none; }
.blast li { font-size: 12px; }
.fix-width { font-family: courier; font-size: 80%; }
.pink { font-weight: bold; font-size: 100%; color: #ff8888; font-family: arial, helvetica, sans-serif; }
.text2 {font-size: 10pt;font-family: arial,helvetica,sans-serif;}
.tbl_title{font-weight: bold; font-size: 10pt; color: #0000FF; font-family: arial, helvetica, sans-serif;}
</style>
<title>Seedtop Document</title>
</head>
<body>
<table bgcolor="#eeeeff" width="600" class="text2">
<tbody>
<tr><td align="center" class="tbl_title">Program Options for seedtop</td></tr>
<tr><td align="center" class="text2">Tao Tao, Ph.D.<br/>User Service<br/>NCBI, NLM, NIH</td></tr>
<tr><td> </td></tr>
<tr><td class="tbl_title">Table of Content</td></tr>
<tr><td>
<ul class="1">
<li><a href="#1">1. Introduction</a></li>
<li><a href="#2">2. Setup</a></li>
<li><a href="#3">3. Program options</a></li>
<li><a href="#4">4. Practical Usage</a>
<ul>
<li><a href="#4.1">4.1 Pattern specification</a></li>
<li><a href="#4.2">4.2 patmatchp</a></li>
<li><a href="#4.3">4.3 patternp</a></li>
<li><a href="#4.4">4.4 patseedp</a></li>
<li><a href="#4.5">4.5 seedp</a></li>
<li><a href="#4.6">4.6 For nucleotide searches</a></li></ul></li>
<li><a href="#5">5. Technical Support</a></li>
</ul><br /></td></tr>
<tr><td> </td></tr>
<tr><td class="tbl_title"><a name="1">1. Introduction</a></td></tr>
<tr><td class="text2">
seedtop is a little-known program found in the NCBI standalone blast package, whose main function
is to search for patterns in an input sequence or database. It has four modes of usage, which are
referred to as "subprograms": two for pattern searches from a input query or database only and
two for pattern initiated sequence alignment. The following table lists these subprograms, their
functions, and required inputs.
<br />
<br />
<table width="600" border="1" class="text">
<tr><td colspan="3" class="tbl_title" align="center">Table 1.1 Subprogram Fucntion of seedtop</td></tr>
<tr class="tbl_title">
<td width="100">Program Call ¹</td><td>Functions</td><td>Required Inputs</td></tr>
<tr><td>-p patmatch</td><td>Search for patterns in an input sequence</td>
<td>Pattern (-k) and sequence (-i)</td></tr>
<tr><td>-p pattern</td><td>Search for patterns in an input database</td>
<td>Same as above</td></tr>
<tr><td>-p patseed</td><td>Search for patterns in the query and align the query against a database</td>
<td>Pattern (-k), input sequence (-i), and target database (-d)</td></tr>
<tr><td>-p seed</td><td>Search for specific pattern in the query and align the query against a database</td>
<td>Same as above ²</td></tr>
<tr><td colspan="3" class="medium2">NOTE:<br/>
¹ The program strings listed are for nucleotide searches. For protein searches, add lowercase p to the program name.<br/>
² The pattern file needs to have an extra HI initialed line to specify the position in the input sequence at which the pattern occurrence of interest starts.</td></tr></table>
</td></tr>
<tr><td> </td></tr>
<tr><td class="tbl_title"><a name="2">2. Setup</a></td></tr>
<tr><td>
Installation of the standalone blast archive is fairly easy. Once the archive is placed in a desired
directory and extracted, the whole package will be installed in a newly created subdirectory called
blast-2.2.13 (assuming 2.2.13 release here). All the programs, including seedtop, will be in the
blast-2.2.13/bin/ subdirectory (blast-2.2.13\bin\ for PC).
<p/>
Appropriate setup requires the creation of .ncbirc configuration file, which blast programs (including
seedtop) read upon startup to locate the appropriate files needed. In this .ncbirc, we can specify the
location of the DATA directory and the BLASTDB directory using the following lines:
<blockquote>
[NCBI]<br/>
DATA=/path/data<br/>
<br/>
[BLAST]<br/>
BLASTDB=/path/db<br/>
</blockquote>
<p/>
The [NCBI] section is used by most of the NCBI programs to locate the data directory and retrieve specific
files needed (MATRIX file for example). The [BLAST] section specifies the path to the directory where
databases are stored.
<p/>
The db directory does not come with the NCBI setup, so one needs to create it after installation. If we
place the directory anywhere, we need to change the path correspondingly. For simplicity, we suggest that
it be created under the blast-2.2.13 (version # will vary for future releases) at the same level as data directory.
<p/>
</td></tr>
<tr><td class="tbl_title"><a name="3">3. Program Options</a></td></tr>
<tr><td>
Once the standalone BLAST package is setup, we can "cd" to the directory and issue the <b>"seedtp -"</b>
command to display the program options for this program. Here we list each option in a table and describe
its functions, argument value, and example usage.
<p/>
<table width="600" border="1" class="text">
<tr><td colspan="2" class="tbl_title">Table 3.1</td></tr>
<tr><td class="tbl_title" width="150">Option</td><td width="450"><b>-d</b></td></tr>
<tr><td class="tbl_title">Function</td><td>Specifies the target database to search</td></tr>
<tr><td class="tbl_title">Default</td><td><b>nr</b></td></tr>
<tr><td class="tbl_title">Input format</td><td>Takes database formatted by formatdb, use name without extension</td></tr>
<tr><td class="tbl_title">Example</td><td>To search against est_human, use: -d est_human</td></tr>
<tr><td class="tbl_title">Note</td><td>This is not a mandatory option, search for patterns in a single input sequence does not require this option.</td></tr>
</table>
<p/>
<table width="600" border="1" class="text">
<tr><td colspan="2" class="tbl_title">Table 3.2</td></tr>
<tr><td class="tbl_title" width="150">Option</td><td width="450"><b>-i</b></td></tr>
<tr><td class="tbl_title">Function</td><td>Specifies the input query file</td></tr>
<tr><td class="tbl_title">Default</td><td><b>stdin</b></td></tr>
<tr><td class="tbl_title">Input format</td><td>[File In], file name with extension</td></tr>
<tr><td class="tbl_title">Example</td><td>To take my_pept.txt as input query, use: -i my_pept.txt</td></tr>
<tr><td class="tbl_title">Note</td><td>To using stdin as input, either redirect or pipe the input:<br/>
seedtop -k pat -p patmatchp <input_file
<br/>more input_file| seedtop -k pat -p patmatchp</td></tr>
</table>
<p/>
<table width="600" border="1" class="text">
<tr><td colspan="2" class="tbl_title">Table 3.3</td></tr>
<tr><td class="tbl_title" width="150">Option</td><td width="450"><b>-k</b></td></tr>
<tr><td class="tbl_title">Function</td><td>Specifies the input pattern (Hit File)</td></tr>
<tr><td class="tbl_title">Default</td><td><b>hit_file</b></td></tr>
<tr><td class="tbl_title">Input format</td><td>Complete file name with extension</td></tr>
<tr><td class="tbl_title">Example</td><td>If the pattern file is named my_pat.txt, use: -k my_pat.txt</td></tr>
<tr><td class="tbl_title">Note</td><td>See section 4.1 below for details.</td></tr>
</table>
<p/>
<table width="600" border="1" class="text">
<tr><td colspan="2" class="tbl_title">Table 3.4</td></tr>
<tr><td class="tbl_title" width="150">Option</td><td width="450"><b>-o</b></td></tr>
<tr><td class="tbl_title">Function</td><td>Specifies the output file name</td></tr>
<tr><td class="tbl_title">Default</td><td><b>stdout</b></td></tr>
<tr><td class="tbl_title">Input format</td><td>file name with or without extension</td></tr>
<tr><td class="tbl_title">Example</td><td>To save result in my_output, use: -o my_output</td></tr>
<tr><td class="tbl_title">Note</td><td>Redirection or piping can be used instead.</td></tr>
</table>
<p/>
<table width="600" border="1" class="text">
<tr><td colspan="2" class="tbl_title">Table 3.5</td></tr>
<tr><td class="tbl_title" width="150">Option</td><td width="450"><b>-G</b></td></tr>
<tr><td class="tbl_title">Function</td><td>Specifies the cost to open a gap </td></tr>
<tr><td class="tbl_title">Default</td><td><b>11</b></td></tr>
<tr><td class="tbl_title">Input format</td><td>[Integer]</td></tr>
<tr><td class="tbl_title">Example</td><td>To change this to 12, use: -G 12</td></tr>
<tr><td class="tbl_title">Note</td><td>The choice of -M option determines the available input value for this option as well as that for -E option. Only a selected set is supported. Detailed list is in the blastall document.</td></tr>
</table>
<p/>
<table width="600" border="1" class="text">
<tr><td colspan="2" class="tbl_title">Table 3.6</td></tr>
<tr><td class="tbl_title" width="150">Option</td><td width="450"><b>-E</b></td></tr>
<tr><td class="tbl_title">Function</td><td>Specifies the cost to extend a gap</td></tr>
<tr><td class="tbl_title">Default</td><td><b>1</b></td></tr>
<tr><td class="tbl_title">Input format</td><td>[Integer]</td></tr>
<tr><td class="tbl_title">Example</td><td>To change this to 2, use: -E 2</td></tr>
<tr><td class="tbl_title">Note</td><td>See Table 3.5 for more information</td></tr>
</table>
<p/>
<table width="600" border="1" class="text">
<tr><td colspan="2" class="tbl_title">Table 3.6</td></tr>
<tr><td class="tbl_title" width="150">Option</td><td width="450"><b>-D</b></td></tr>
<tr><td class="tbl_title">Function</td><td>Specifies the cost to decline alignment</td></tr>
<tr><td class="tbl_title">Default</td><td><b>99999</b></td></tr>
<tr><td class="tbl_title">Input format</td><td>[Integer]</td></tr>
<tr><td class="tbl_title">Example</td><td>N/A</td></tr>
<tr><td class="tbl_title">Note</td><td>Functions similar to the -L option in blastpgp. If enabled, it would implement Dr. Altschul's 3-parameter gap model for scoring.</td></tr>
</table>
<p/>
<table width="600" border="1" class="text">
<tr><td colspan="2" class="tbl_title">Table 3.7</td></tr>
<tr><td class="tbl_title" width="150">Option</td><td width="450"><b>-X</b></td></tr>
<tr><td class="tbl_title">Function</td><td>Specifies X dropoff value for gapped alignment (in bits)</td></tr>
<tr><td class="tbl_title">Default</td><td><b>15</b></td></tr>
<tr><td class="tbl_title">Input format</td><td>[Integer]</td></tr>
<tr><td class="tbl_title">Example</td><td>To increase this dropoff value to 20, use: -X 20 </td></tr>
<tr><td class="tbl_title">Note</td><td>Increasing this value may enable one to see a longer alignment</td></tr>
</table>
<p/>
<table width="600" border="1" class="text">
<tr><td colspan="2" class="tbl_title">Table 3.8</td></tr>
<tr><td class="tbl_title" width="150">Option</td><td width="450"><b>-S</b></td></tr>
<tr><td class="tbl_title">Function</td><td>Specifies cutoff cost</td></tr>
<tr><td class="tbl_title">Default</td><td><b>30</b></td></tr>
<tr><td class="tbl_title">Input format</td><td>[Integer]</td></tr>
<tr><td class="tbl_title">Example</td><td>N/A</td></tr>
<tr><td class="tbl_title">Note</td><td>Currently it is overridden in pseed3.c. It could allow the user to control the score threshold applied to the part of the alignment that does not include the pattern in deciding which alignment(s) to report. </td></tr>
</table>
<p/>
<table width="600" border="1" class="text">
<tr><td colspan="2" class="tbl_title">Table 3.9</td></tr>
<tr><td class="tbl_title" width="150">Option</td><td width="450"><b>-C</b></td></tr>
<tr><td class="tbl_title">Function</td><td>Score only or not</td></tr>
<tr><td class="tbl_title">Default</td><td><b>1</b></td></tr>
<tr><td class="tbl_title">Input format</td><td>[Integer]</td></tr>
<tr><td class="tbl_title">Example</td><td>N/A</td></tr>
<tr><td class="tbl_title">Note</td><td>This is relevant only to searches with -p seed(p) or -p patseed(p). NOT implemented yet. </td></tr>
</table>
<p/>
<table width="600" border="1" class="text">
<tr><td colspan="2" class="tbl_title">Table 3.10</td></tr>
<tr><td class="tbl_title" width="150">Option</td><td width="450"><b>-I</b></td></tr>
<tr><td class="tbl_title">Function</td><td>Whether to Show GI's in deflines</td></tr>
<tr><td class="tbl_title">Default</td><td><b>F</b></td></tr>
<tr><td class="tbl_title">Input format</td><td>[T/F]</td></tr>
<tr><td class="tbl_title">Example</td><td>To display GI in the deflines, use: -I T</td></tr>
<tr><td class="tbl_title">Note</td><td>Relevant only to searches with -p seed(p) or -p patseed(p)</td></tr>
</table>
<p/>
<table width="600" border="1" class="text">
<tr><td colspan="2" class="tbl_title">Table 3.11</td></tr>
<tr><td class="tbl_title" width="150">Option</td><td width="450"><b>-e</b></td></tr>
<tr><td class="tbl_title">Function</td><td>Specifies the expectation value (E) cutoff</td></tr>
<tr><td class="tbl_title">Default</td><td><b>10.0</b></td></tr>
<tr><td class="tbl_title">Input format</td><td>[Real]</td></tr>
<tr><td class="tbl_title">Example</td><td>To set this to 0.001, use: -e 0.001 or -e 1e-3</td></tr>
<tr><td class="tbl_title">Note</td><td>Relevant only to searches with -p seed(p) or -p patseed(p)</td></tr>
</table>
<p/>
<table width="600" border="1" class="text">
<tr><td colspan="2" class="tbl_title">Table 3.12</td></tr>
<tr><td class="tbl_title" width="150">Option</td><td width="450"><b>-J</b></td></tr>
<tr><td class="tbl_title">Function</td><td>Whether to believe the query defline or not</td></tr>
<tr><td class="tbl_title">Default</td><td><b>F</b></td></tr>
<tr><td class="tbl_title">Input format</td><td>[T/F]</td></tr>
<tr><td class="tbl_title">Example</td><td>To set this to true, use: -J T</td></tr>
<tr><td class="tbl_title">Note</td><td>To save SeqAlign object requires -J T </td></tr>
</table>
<p/>
<table width="600" border="1" class="text">
<tr><td colspan="2" class="tbl_title">Table 3.13</td></tr>
<tr><td class="tbl_title" width="150">Option</td><td width="450"><b>-O</b></td></tr>
<tr><td class="tbl_title">Function</td><td>Specifies the output file for SeqAlign object </td></tr>
<tr><td class="tbl_title">Default</td><td><b>Optional</b></td></tr>
<tr><td class="tbl_title">Input format</td><td>[File Out]</td></tr>
<tr><td class="tbl_title">Example</td><td>N/A</td></tr>
<tr><td class="tbl_title">Note</td><td>Relevant only to searches with -p seed(p) or -p patseed(p). NOT implement yet.</td></tr>
</table>
<p/>
<table width="600" border="1" class="text">
<tr><td colspan="2" class="tbl_title">Table 3.14</td></tr>
<tr><td class="tbl_title" width="150">Option</td><td width="450"><b>-M</b></td></tr>
<tr><td class="tbl_title">Function</td><td>Specifies which matrix file to use</td></tr>
<tr><td class="tbl_title">Default</td><td><b>BLOSUM62</b></td></tr>
<tr><td class="tbl_title">Input format</td><td>[String]</td></tr>
<tr><td class="tbl_title">Example</td><td>To set matrix to PAM30, use: -M PAM30</td></tr>
<tr><td class="tbl_title">Note</td><td>Relevant to seedp/patseedp searches, only a limited set is supported</td></tr>
</table>
<p/>
<table width="600" border="1" class="text">
<tr><td colspan="2" class="tbl_title">Table 3.15</td></tr>
<tr><td class="tbl_title" width="150">Option</td><td width="450"><b>-p</b></td></tr>
<tr><td class="tbl_title">Function</td><td>Specifies which subprogram to run</td></tr>
<tr><td class="tbl_title">Default</td><td><b>patmatchp</b></td></tr>
<tr><td class="tbl_title">Input format</td><td>[String]</td></tr>
<tr><td class="tbl_title">Example</td><td>To find protein patterns in a database, use: -p patternp</td></tr>
<tr><td class="tbl_title">Note</td><td>Choices for nucleotide searches: patmatch, pattern, seed, and patseed<br/>
Choices for protein searches: patmatchp, patternp, seedp, and patseedp</td></tr>
</table>
<p/>
<table width="600" border="1" class="text">
<tr><td colspan="2" class="tbl_title">Table 3.16</td></tr>
<tr><td class="tbl_title" width="150">Option</td><td width="450"><b>-r</b></td></tr>
<tr><td class="tbl_title">Function</td><td>Specifies the reward for a match</td></tr>
<tr><td class="tbl_title">Default</td><td><b>10</b></td></tr>
<tr><td class="tbl_title">Input format</td><td>[Integer]</td></tr>
<tr><td class="tbl_title">Example</td><td>To increase the reward to 20, use: -r 20</td></tr>
<tr><td class="tbl_title">Note</td><td>Relevant only to searches with -p seed(p) or -p patseed<br/>For nucleotide searches only.</td></tr>
</table>
<p/>
<table width="600" border="1" class="text">
<tr><td colspan="2" class="tbl_title">Table 3.17</td></tr>
<tr><td class="tbl_title" width="150">Option</td><td width="450"><b>-q</b></td></tr>
<tr><td class="tbl_title">Function</td><td>Specifies the cost for a mismatch</td></tr>
<tr><td class="tbl_title">Default</td><td><b>-10</b></td></tr>
<tr><td class="tbl_title">Input format</td><td>[Integer]</td></tr>
<tr><td class="tbl_title">Example</td><td>To increase the penalty to -15, use: -q -15</td></tr>
<tr><td class="tbl_title">Note</td><td>For nucleotide search with seed/patseed only.</td></tr>
</table>
<p/>
<table width="600" border="1" class="text">
<tr><td colspan="2" class="tbl_title">Table 3.18</td></tr>
<tr><td class="tbl_title" width="150">Option</td><td width="450"><b>-F</b></td></tr>
<tr><td class="tbl_title">Function</td><td>Whether to filter query sequence with SEG</td></tr>
<tr><td class="tbl_title">Default</td><td><b>F</b></td></tr>
<tr><td class="tbl_title">Input format</td><td>[T/F]</td></tr>
<tr><td class="tbl_title">Example</td><td>To activate filter, use: -F T</td></tr>
<tr><td class="tbl_title">Note</td><td>Relevant to seedp/patseedp searches only.</td></tr>
</table>
<p/>
</td></tr>
<tr><td class="tbl_title"><a name="4">4. Execution and Practical Usage</a></td></tr>
<tr><td>
The most useful functionalities of seedtop are patmatchp and patternp. Since patseedp
does not generate the actual alignment and its function is already incorporated in blastpgp, we
will not cover it here. The functionality for searching with nucleotide entries are similar to
protein searches, we will only provide a couple simple examples.
<br /><br /></td></tr>
<tr><td class="tbl_title"><a name="4.1"> 4.1 Pattern specification</a></td></tr>
<tr><td>
The pattern input file is unique for seedtop. Each pattern is specified by two lines,
ID initialed for identification and PA initialed for pattern. The pattern specification uses the
ProSite syntax. Multiple patterns can be used, each separated by a single line with a "/".
<p/>
Single pattern specification:
<blockquote class="fixed">
ID Cyclic nucleotide-binding domain signature 2.<br/>
PA [LIVMF]-G-E-x-[GAS]-[LIVM]-x(5,11)-R-[STAQ]-A-x-[LIVMA]-x-[STACV].<br/>
</blockquote>
<p/>
Multiple pattern specification:
<blockquote class="fixed">
ID Cyclic nucleotide-binding domain signature 2.<br/>
PA [LIVMF]-G-E-x-[GAS]-[LIVM]-x(5,11)-R-[STAQ]-A-x-[LIVMA]-x-[STACV].<br/>
/<br/>
ID Cyclic nucleotide-binding domain signature 1<br/>
PA [LIVM]-[VIC]-x-{H}-G-[DENQTA]-x-[GAC]-{L}-x-[LIVMFY](4)-x(2)-G<br/>
</blockquote>
<p/>
Pattern lines should be less than 100 letters long. Longer patterns can be broken
up into two or more PA lines. Multiple individual patterns should be separated by a line
containing a single backslash (/). Other general pattern rules can be summarized as the
following:
<blockquote>
<table class="text2" width="500">
<tr><td width="60"><b>Symbol</b></td><td><b>Meaning</b></td></tr>
<tr><td><b>[]</b></td><td>marks a single position, match to anyone in the bracket is acceptable</td></tr>
<tr><td><b>(x,y)</b></td><td>marks a range for the residue(s) before it, matching within the range is acceptable</td></tr>
<tr><td><b>(x,)</b></td><td>represents range with no upper limit for the residue(s) before it</td></tr>
<tr><td><b>(x)</b></td><td>represents exact number of matches for the residue(s) before it</td></tr>
<tr><td><b>{}</b></td><td>marks a single residue, residues in the braces should be excluded</td></tr>
<tr><td><b>-</b></td><td>separates the individual positions in the pattern</td></tr>
<tr><td><b>.</b></td><td>used at the end marks the end of a pattern</td></tr>
<tr><td><b>></b></td><td>symbol at the end marks an incomplete pattern (optional)</td></tr>
</table>
</blockquote>
</td></tr>
<tr><td class="tbl_title"><a name="4.2"> 4.2 patmatchp</a></td></tr>
<tr><td>
This function matches patterns found in an input pattern file and identifies the
pattern occurrences in an input protein sequence. The sample command line below (given
along with its output) takes an input pattern named pattern.txt, searches against the
input query.aa target sequence, and saves the output in a file named query.out.
<blockquote>
seedtop -k pattern.txt -i query.aa -p patmatchp -o query.out<br/>
<br/>
Name Cyclic nucleotide-binding domain signature 2.<br/>
Pattern [LIVMF]-G-E-x-[GAS]-[LIVM]-x(5,11)-R-[STAQ]-A-x-[LIVMA]-x-[STACV].<br/>
At position 521 of query sequence<br/>
Name Cyclic nucleotide-binding domain signature 1<br/>
Pattern [LIVM]-[VIC]-x-{H}-G-[DENQTA]-x-[GAC]-{L}-x-[LIVMFY](4)-x(2)-G<br/>
At position 483 of query sequence<br/>
</blockquote>
The result only lists the pattern starting positions in the query sequence.
<br /><br />
</td></tr>
<tr><td class="tbl_title"><a name="4.3"> 4.3 patternp</a></td></tr>
<tr><td>
This function matches patterns in an input pattern file against an input database
and reports back the database entries containing one or more of the input patterns as
well as the pattern locations. The sample command line below (given with its partial output)
takes an input pattern named pattern.txt, searches against the input refseq_protein database,
the identified entries with pattern matches are saved in the output file db.out.
<blockquote>
seedtop -k pattern.txt -d refseq_protein -p patternp -o db.out<br />
<br />
seqno=892602 gi|33859524|ref|NP_034048.1|<br />
<br />
ID Cyclic nucleotide-binding domain signature 1<br />
PA [LIVM]-[VIC]-x-{H}-G-[DENQTA]-x-[GAC]-{L}-x-[LIVMFY](4)-x(2)-G<br />
HI (449 450) (452 454) (456 457) (459 462) (465 465)<br />
seqno=892873 gi|51470807|ref|XP_290552.4|<br />
<br />
ID Cyclic nucleotide-binding domain signature 1<br />
PA [LIVM]-[VIC]-x-{H}-G-[DENQTA]-x-[GAC]-{L}-x-[LIVMFY](4)-x(2)-G<br />
HI (374 375) (377 379) (381 382) (384 387) (390 390)<br />
</blockquote>
<p/>
Each hit is described in the following lines:<ul>
<li><b>seqno:</b> seqid for the database sequence with pattern matches</li>
<li><b>ID:</b> Pattern ID, reiterated pattern input</li>
<li><b>PA:</b> Pattern, reiterated pattern input</li>
<li><b>HI:</b> Hit position on the db sequence, regions broken up by X</li></ul>
</td></tr>
<tr><td class="tbl_title"><a name="4.4"> 4.4 patseedp</a></td></tr>
<tr><td>
This function takes three inputs, an input pattern, a query protein sequence
with the pattern, and a protein sequence database. It identifies the pattern in the
query and aligns the query against the database entries that contains the same pattern.
It reports the pattern position in the query, the total number of pattern occurrences
in the database, and the actual database entries with pattern and alignment to the
input query. Specifically, it reports the seqid of the database entry, its alignment
(with the query) E-value, scores, and pattern position.
<blockquote>
seedtop -k pattern.txt -d refseq_protein -p patseedp -o pat_db.out -i query_aa.txt<br />
<br />
1 occurrence(s) of pattern in query<br />
<br />
Name Cyclic nucleotide-binding domain signature 2.<br />
Pattern [LIVMF]-G-E-x-[GAS]-[LIVM]-x(5,11)-R-[STAQ]-A-x-[LIVMA]-x-[STACV].<br />
At position 488 of query sequence<br />
effective database length=3.1e+008<br />
pattern probability=3.4e-008<br />
lengthXprobability=1.0e+001<br />
<br />
Number of occurrences of pattern in the database is 265<br />
892602 gi|33859524|ref|NP_034048.1|<br />
0 Total Score 3279 Outside Pattern Score 3162 Match start in db seq 488<br />
Extent in query seq 1 631 Extent in db seq 1 631<br />
<br />
1 occurrence(s) of pattern in query<br />
<br />
Name Cyclic nucleotide-binding domain signature 1<br />
Pattern [LIVM]-[VIC]-x-{H}-G-[DENQTA]-x-[GAC]-{L}-x-[LIVMFY](4)-x(2)-G<br />
At position 450 of query sequence<br />
effective database length=3.1e+008<br />
pattern probability=7.0e-008<br />
lengthXprobability=2.2e+001<br />
<br />
Number of occurrences of pattern in the database is 247<br />
892602 gi|33859524|ref|NP_034048.1|<br />
0 Total Score 3279 Outside Pattern Score 3188 Match start in db seq 450<br />
Extent in query seq 1 631 Extent in db seq 1 631
</blockquote>
<p/>
The input pattern.txt file contains two patterns, so the result contains two
sections, one for each pattern. Only one database match is shown.
</td></tr>
<tr><td class="tbl_title"><a name="4.5"> 4.5. seedp</a></td></tr>
<tr><td>
This function is similar to patseedp. The only difference is that the pattern
file should specify the pattern position in the input query sequence. The output from
the patternp can be used for this purpose. This specifies which pattern is to be used
during the search.
<p/>
An actual command line and pattern file is listed below. Output is omitted since
it is essential the same as that from patseedp given above.
<blockquote>
seedtop -p seedp -k pattern2a.txt -d refseq_protein -i q_aa.txt -o seed.out<br />
<br />
ID Cyclic nucleotide-binding domain signature 1<br />
PA [LIVM]-[VIC]-x-{H}-G-[DENQTA]-x-[GAC]-{L}-x-[LIVMFY](4)-x(2)-G<br />
HI (450 451) (453 455) (457 458) (460 463) (466 466)<br />
</blockquote>
<p/>
The seedp functionality has been incorporated into standalone blastpgp, on the
BLAST web server, it is called Pattern-Hit-Initiated BLAST or PHI-BLAST. The differences
are that blastpgp does not report the total number of pattern occurrences in the database,
but it does generate actual sequence alignments. The implementation in blastpgp provides
more functionality in that the results of the (first) round of PHI-BLAST search can be
used seamlessly as the start of a PSI-BLAST iterated search.
</td></tr>
<tr><td class="tbl_title"><a name="4.6"> 4.6. For nucleotide searches</a></td></tr>
<tr><td>
Search for nucleotide patterns using seedtop is very similar to what described
above for proteins queries. The function names for nucleotide have no terminating p.
<p/>
</td></tr>
<tr><td class="tbl_title"><a name="5">5. Technical Support</a></td></tr>
<tr><td>
For additional questions and comments, please write to:
<blockquote>
<a href="mailto:blast-help@ncbi.nlm.nih.gov">blast-help@ncbi.nlm.nih.gov</a>
</blockquote>
Questions and inquries on other NCBI resources should be sent to:
<blockquote>
<a href="mailto:info@ncbi.nlm.nih.gov">info@ncbi.nlm.nih.gov</a>
</blockquote>
</td></tr>
<tr><td align="right" class="medium2"><script type="text/javascript">
//<![CDATA[
document.write("Updated on "); document.write(document.lastModified);
//]]>
</script></td></tr>
</tbody>
</table>
</body>
</html>
|