Website Keyword Search Engine Script
Simple Search (v1.0) — Matt Wright's site search engine that indexed and searched plain text files on a web server. It scanned specified directories for keyword matches and returned results with page titles extracted from HTML <title> tags.
Simple Search read files on every request — acceptable for a 50-page site in 1997, unusable at any real scale. Modern tools pre-build indexes and return results in under 50 ms.
| Solution | Type | Best For | Typo Tolerance | Cost |
|---|---|---|---|---|
| Algolia | SaaS | E-commerce, apps with instant search | Yes | Free 10K req/mo, then ~$1/1K |
| Elasticsearch | Self-hosted / Cloud | Large datasets, log analytics, full-text search | Yes | Free (self-hosted), Cloud from $95/mo |
| Meilisearch | Self-hosted / Cloud | Apps, e-commerce (Algolia alternative) | Yes | Free (self-hosted), Cloud from $0 |
| Typesense | Self-hosted / Cloud | High-traffic apps, faceted search | Yes | Free (self-hosted), $19+/mo cloud |
| Lunr.js | Client-side JS (~8 KB) | Static sites, offline-capable search | Basic (stemming) | Free (MIT) |
| Pagefind | Static build + JS (~10 KB) | Hugo, Jekyll, Astro — zero server | Basic | Free |
For a small static site (under 500 pages), Lunr.js or Pagefind works well. For anything dynamic or over 10K documents, Meilisearch, Typesense, or Algolia are the practical choices.
Simple Search is a Perl CGI script that provides keyword search for static HTML and text files. On each request, it reads through files in the configured directories, matches keywords, and returns a list of links using the <title> tag of each matching document.
The script opens every file in the specified directories, converts content to lowercase, and tests each search term against it. Boolean AND requires all terms to match; OR requires at least one. Results are displayed as a flat list of page-title links.
| File | Description |
|---|---|
search.pl |
Main Perl script that performs the search and displays results |
search.html |
HTML form template for the search interface |
README |
Installation instructions and configuration guide |
Search for single keywords or phrases across all specified documents on your website.
AND/OR operators to combine multiple keywords for refined results.
Automatically extracts page titles from HTML documents to display meaningful result links.
Configure multiple directories to search, covering your entire site structure.
Searches both plain text (.txt) and HTML files (.html, .htm).
Modify the results page template to match your website's design.
| Option | Description | Example |
|---|---|---|
| Single Keyword | Search for a single word in all documents | perl |
| Multiple Keywords (AND) | Find documents containing ALL specified keywords | perl cgi script |
| Multiple Keywords (OR) | Find documents containing ANY of the keywords | perl OR php OR python |
| Case Insensitive | All searches are case-insensitive by default | PERL = perl = Perl |
search.pl to your cgi-bin directory.
#!/usr/bin/perl).
@directories array with the paths to directories you want to search.
$baseurl variable to match your website's URL structure.
chmod 755 search.pl
search.html to create your search interface.
<!DOCTYPE html>
<html>
<head>
<title>Search Our Site</title>
</head>
<body>
<h1>Search</h1>
<form action="/cgi-bin/search.pl" method="GET">
<p>
<label for="keywords">Enter Keywords:</label><br>
<input type="text" name="keywords" id="keywords" size="40">
</p>
<p>
Search Type:<br>
<input type="radio" name="boolean" value="AND" id="and" checked>
<label for="and">Match ALL keywords (AND)</label><br>
<input type="radio" name="boolean" value="OR" id="or">
<label for="or">Match ANY keyword (OR)</label>
</p>
<p>
<input type="submit" value="Search">
<input type="reset" value="Clear">
</p>
</form>
<!-- Bootstrap 5 version -->
<form action="/cgi-bin/search.pl" method="GET" class="needs-validation">
<div class="mb-3">
<label for="keywords" class="form-label">Enter Keywords</label>
<input type="text" class="form-control" name="keywords" id="keywords"
placeholder="Search..." required>
</div>
<div class="mb-3">
<label class="form-label">Search Type</label>
<div class="form-check">
<input class="form-check-input" type="radio" name="boolean"
value="AND" id="and" checked>
<label class="form-check-label" for="and">
Match ALL keywords (AND)
</label>
</div>
<div class="form-check">
<input class="form-check-input" type="radio" name="boolean"
value="OR" id="or">
<label class="form-check-label" for="or">
Match ANY keyword (OR)
</label>
</div>
</div>
<button type="submit" class="btn btn-primary">
<i class="bi bi-search"></i> Search
</button>
</form>
</body>
</html>
#!/usr/bin/perl
use strict;
use warnings;
use CGI;
use File::Find;
my $cgi = CGI->new;
# Configuration
my @directories = ('/var/www/html/docs', '/var/www/html/pages');
my $baseurl = 'http://example.com';
my @extensions = qw(html htm txt);
# Get search parameters
my $keywords = $cgi->param('keywords') || '';
my $boolean = $cgi->param('boolean') || 'AND';
# Security: sanitize input
$keywords =~ s/[^\w\s]//g;
my @terms = split(/\s+/, lc($keywords));
# Output HTML header
print $cgi->header('text/html');
print $cgi->start_html('Search Results');
print "Search Results
\n";
if (!@terms) {
print "Please enter search keywords.
\n";
print $cgi->end_html;
exit;
}
print "Searching for: $keywords ($boolean)
\n";
# Search files
my @results;
find(sub {
return unless -f;
my $file = $File::Find::name;
# Check extension
my ($ext) = $file =~ /\.(\w+)$/;
return unless $ext && grep { $_ eq lc($ext) } @extensions;
# Read file content
open(my $fh, '<', $file) or return;
my $content = do { local $/; <$fh> };
close($fh);
$content = lc($content);
# Check for matches
my $match = 0;
if ($boolean eq 'AND') {
$match = 1;
for my $term (@terms) {
unless ($content =~ /\b\Q$term\E\b/i) {
$match = 0;
last;
}
}
} else { # OR
for my $term (@terms) {
if ($content =~ /\b\Q$term\E\b/i) {
$match = 1;
last;
}
}
}
if ($match) {
# Extract title
my ($title) = $content =~ /([^<]+)<\/title>/i;
$title ||= $file;
# Convert path to URL
my $url = $file;
$url =~ s{^/var/www/html}{$baseurl};
push @results, { title => $title, url => $url };
}
}, @directories);
# Display results
if (@results) {
print "Found " . scalar(@results) . " result(s):
\n";
print "\n";
for my $result (@results) {
print qq{- $result->{title}
\n};
}
print "
\n";
} else {
print "No results found.
\n";
}
print $cgi->end_html;
exit 0;
<?php
/**
* Simple Search - PHP Version
*/
// Configuration
$config = [
'directories' => [
'/var/www/html/docs',
'/var/www/html/pages'
],
'baseurl' => 'https://example.com',
'extensions' => ['html', 'htm', 'txt', 'php'],
'max_results' => 100
];
// Get search parameters
$keywords = $_GET['keywords'] ?? '';
$boolean = $_GET['boolean'] ?? 'AND';
// Security: sanitize input
$keywords = preg_replace('/[^\w\s]/u', '', $keywords);
$terms = array_filter(explode(' ', strtolower($keywords)));
function searchFiles($directories, $extensions, $baseurl) {
$files = [];
foreach ($directories as $dir) {
if (!is_dir($dir)) continue;
$iterator = new RecursiveIteratorIterator(
new RecursiveDirectoryIterator($dir)
);
foreach ($iterator as $file) {
if (!$file->isFile()) continue;
$ext = strtolower($file->getExtension());
if (!in_array($ext, $extensions)) continue;
$files[] = [
'path' => $file->getPathname(),
'url' => str_replace('/var/www/html', $baseurl, $file->getPathname())
];
}
}
return $files;
}
function extractTitle($content, $fallback) {
if (preg_match('/([^<]+)<\/title>/i', $content, $matches)) {
return htmlspecialchars($matches[1]);
}
return htmlspecialchars(basename($fallback));
}
function matchesSearch($content, $terms, $boolean) {
$content = strtolower($content);
if ($boolean === 'AND') {
foreach ($terms as $term) {
if (stripos($content, $term) === false) {
return false;
}
}
return true;
} else { // OR
foreach ($terms as $term) {
if (stripos($content, $term) !== false) {
return true;
}
}
return false;
}
}
// Perform search
$results = [];
if (!empty($terms)) {
$files = searchFiles(
$config['directories'],
$config['extensions'],
$config['baseurl']
);
foreach ($files as $file) {
$content = @file_get_contents($file['path']);
if ($content === false) continue;
if (matchesSearch($content, $terms, $boolean)) {
$results[] = [
'title' => extractTitle($content, $file['path']),
'url' => $file['url']
];
if (count($results) >= $config['max_results']) {
break;
}
}
}
}
?>
<!DOCTYPE html>
<html>
<head>
<title>Search Results</title>
</head>
<body>
<h1>Search Results</h1>
<?php if (empty($terms)): ?>
<p>Please enter search keywords.</p>
<?php else: ?>
<p>Searching for: <strong><?= htmlspecialchars($keywords) ?></strong>
(<?= $boolean ?>)</p>
<?php if (!empty($results)): ?>
<p>Found <?= count($results) ?> result(s):</p>
<ul>
<?php foreach ($results as $result): ?>
<li>
<a href="<?= htmlspecialchars($result['url']) ?>">
<?= $result['title'] ?>
</a>
</li>
<?php endforeach; ?>
</ul>
<?php else: ?>
<p>No results found.</p>
<?php endif; ?>
<?php endif; ?>
</body>
</html>
/**
* Simple Search - JavaScript Version
* Client-side search for static sites (requires pre-built index)
*/
class SimpleSearch {
constructor(options = {}) {
this.options = {
indexUrl: '/search-index.json',
inputSelector: '#search-input',
resultsSelector: '#search-results',
minChars: 2,
maxResults: 20,
highlightMatches: true,
...options
};
this.index = [];
this.init();
}
async init() {
await this.loadIndex();
this.bindEvents();
}
async loadIndex() {
try {
const response = await fetch(this.options.indexUrl);
this.index = await response.json();
} catch (error) {
console.error('Failed to load search index:', error);
}
}
bindEvents() {
const input = document.querySelector(this.options.inputSelector);
if (input) {
input.addEventListener('input', (e) => this.handleSearch(e.target.value));
// Handle form submission
input.closest('form')?.addEventListener('submit', (e) => {
e.preventDefault();
this.handleSearch(input.value);
});
}
}
handleSearch(query) {
if (query.length < this.options.minChars) {
this.displayResults([]);
return;
}
const results = this.search(query);
this.displayResults(results, query);
}
search(query) {
const terms = query.toLowerCase().split(/\s+/).filter(t => t.length > 0);
if (terms.length === 0) return [];
return this.index
.map(item => ({
...item,
score: this.calculateScore(item, terms)
}))
.filter(item => item.score > 0)
.sort((a, b) => b.score - a.score)
.slice(0, this.options.maxResults);
}
calculateScore(item, terms) {
let score = 0;
const titleLower = item.title.toLowerCase();
const contentLower = (item.content || '').toLowerCase();
for (const term of terms) {
// Title matches are worth more
if (titleLower.includes(term)) {
score += 10;
// Exact word match in title
if (new RegExp(`\\b${term}\\b`).test(titleLower)) {
score += 5;
}
}
// Content matches
const contentMatches = (contentLower.match(new RegExp(term, 'g')) || []).length;
score += Math.min(contentMatches, 5); // Cap at 5 points per term
}
return score;
}
displayResults(results, query = '') {
const container = document.querySelector(this.options.resultsSelector);
if (!container) return;
if (results.length === 0) {
container.innerHTML = query.length >= this.options.minChars
? 'No results found.
'
: '';
return;
}
const html = results.map(result => {
let title = result.title;
let snippet = result.snippet || '';
if (this.options.highlightMatches && query) {
const terms = query.split(/\s+/);
terms.forEach(term => {
const regex = new RegExp(`(${term})`, 'gi');
title = title.replace(regex, '$1');
snippet = snippet.replace(regex, '$1');
});
}
return `
`;
}).join('');
container.innerHTML = `
Found ${results.length} result(s):
${html}
`;
}
}
// Usage
const search = new SimpleSearch({
indexUrl: '/search-index.json',
inputSelector: '#search-input',
resultsSelector: '#search-results'
});
// Building a search index (Node.js build script example)
/*
const fs = require('fs');
const path = require('path');
const cheerio = require('cheerio');
function buildIndex(directory) {
const index = [];
const files = walkDir(directory);
files.forEach(file => {
if (!file.endsWith('.html')) return;
const content = fs.readFileSync(file, 'utf-8');
const $ = cheerio.load(content);
index.push({
url: file.replace(directory, ''),
title: $('title').text() || path.basename(file),
content: $('body').text().replace(/\s+/g, ' ').slice(0, 500),
snippet: $('meta[name="description"]').attr('content') || ''
});
});
return index;
}
fs.writeFileSync('search-index.json', JSON.stringify(buildIndex('./public')));
*/
Main Perl script that performs keyword searches
HTML form template for the search interface
Installation instructions and configuration guide
Community-contributed enhancements and localizations:
A modified version with built-in debugging options to help troubleshoot search issues. Created by the MSA help list community.
Community ContributionA localized version modified to search Japanese character sets (Shift-JIS, EUC-JP).
LocalizationFile::Find module (as shown in the code examples), or list all subdirectories explicitly in the configuration.
@exclude = ('admin', 'private', '*.bak'); and check each file against it before searching.
<mark> tags.
pdftotext first. For Word documents, convert to text with a library. For sites with many binary formats, a dedicated search engine (Elasticsearch, Solr) is more practical.
lc() calls that convert text to lowercase before comparison. You can also add a checkbox to the search form and check that parameter in the script.
page parameter to the query string. Calculate offset from page number and results-per-page, slice the results array, and generate pagination links. Example: ?keywords=perl&page=2.