Home/Blog/How do I calculate visual similarity between domains?
Security

How do I calculate visual similarity between domains?

Domain names that look similar can be used in phishing attacks. Learn how to calculate visual similarity and detect homograph attacks.

By Inventive HQ Team
How do I calculate visual similarity between domains?

Understanding Visual Similarity in Domains

Homograph attacks use domain names that look visually similar to legitimate domains to trick users into visiting fraudulent sites. For example, "rn" looks similar to "m", and attackers exploit this confusion. Calculating visual similarity helps identify and defend against these attacks.

Understanding visual similarity metrics enables organizations to protect users, register protective variants, and detect spoofing attempts early.

Common Visual Similarity Techniques

1. Character Substitution (Homograph Attacks)

Visually similar characters:

  • l (lowercase L) vs. I (uppercase i) vs. 1 (one)
  • 0 (zero) vs. O (uppercase O)
  • rn vs. m vs. rn
  • cl vs. d
  • ɪ (Latin letter) vs. i (ASCII i)

Examples:

Legitimate: amazon.com
Attacks:
- amaz0n.com (zero instead of o)
- amzon.com (missing 'a', looks similar)
- αmazon.com (Greek alpha instead of 'a')
- аmаzon.com (Cyrillic 'a' instead of ASCII 'a')

2. Internationalized Domain Names (IDN)

Using non-ASCII characters:

English: example.com
Cyrillic lookalike: exаmple.com (Cyrillic 'a' U+0430 instead of ASCII 'a')
Greek lookalike: εхаmple.com (mixed Greek and Cyrillic)

Unicode homoglyphs:

  • Look identical to users but different characters
  • Different character codes
  • Registered as different domains
  • Often intentional attack vectors

3. Domain Structure Manipulation

Exploiting domain structure:

Legitimate: example.com
Attacks:
- exаmple.co.m (split with dots)
- exаmple-real.com (add legitimate-looking suffix)
- exаmple.com.fake (add TLD to legitimate domain)
- subdomain.exаmple.com (looks like subdomain)

Metrics for Calculating Visual Similarity

1. Levenshtein Distance

Measures character-level differences between strings.

Algorithm:

  • Count minimum edits needed to transform one string to another
  • Edit types: insert, delete, substitute
  • Lower distance = more similar

Example:

"example" vs. "exаmple" (one Cyrillic character)
Distance: 1 (one substitution needed)

"amazon" vs. "amаzоn" (two substitutions)
Distance: 2

"example" vs. "amazon"
Distance: 3 (e→a, x→m, l→z)

Threshold:

  • Distance ≤ 2: Suspicious, likely homograph attack
  • Distance ≤ 1: Very suspicious, definitely investigate

2. Damerau-Levenshtein Distance

Enhanced Levenshtein distance allowing transpositions.

Includes:

  • Insertions
  • Deletions
  • Substitutions
  • Transpositions (character swaps)

Example:

"example" vs. "exmaple" (transposed 'a' and 'm')
Damerau-Levenshtein: 1 (one transposition)

3. Jaro-Winkler Similarity

Measures similarity as a number 0-1.

How it works:

  • Considers matching characters at different positions
  • Weights matches based on position
  • Higher values = more similar

Example:

"example.com" vs. "exаmple.com"
Similarity: 0.995 (nearly identical, one different character)
0.9+ = Suspicious similarity
0.95+ = Likely homograph attack

4. Visual Character Confusion Matrix

Classifies characters by visual similarity:

GroupCharacters
Group 1l, I, 1,
Group 20, O (looks circular)
Group 3m, rn, ɪ (similar shape)
Group 4a, ɑ, α, а (various 'a' characters)
Group 5e, е, ё (various 'e' characters)
Group 6o, ο, о, ӧ (various 'o' characters)

Similarity score:

  • Same group: High similarity
  • Different group: Low similarity

5. Unicode Confusable Characters

Based on Unicode Confusables list:

  • Maintained by Unicode Consortium
  • Lists characters that look identical
  • Used for security checks

Example confusables:

U+0041 (ASCII A) ≈ U+0391 (Greek Alpha)
U+0435 (Cyrillic 'e') ≈ U+0065 (ASCII 'e')
U+043E (Cyrillic 'o') ≈ U+006F (ASCII 'o')

Check using Unicode Confusables API:

Input: а (Cyrillic, U+0430)
Confusable with: a (ASCII, U+0061)
Risk: High (visual homograph)

Practical Homograph Detection

1. Domain Registration Monitoring

Detect registered homographs:

from difflib import SequenceMatcher

def similarity_ratio(a, b):
    return SequenceMatcher(None, a, b).ratio()

# Monitor registrations
new_domain = "amаzоn.com"  # Cyrillic 'a' and 'o'
protected_domain = "amazon.com"

if similarity_ratio(new_domain, protected_domain) > 0.95:
    print("ALERT: Homograph domain registered!")
    print(f"Similarity: {similarity_ratio(new_domain, protected_domain)}")

2. Email Address Spoofing Detection

Identify spoofed email addresses:

Legitimate: john.smith@example.com
Spoofed:    jоhn.smith@еxamplе.com
            (Cyrillic о, е, е, е)

Detection:
- Compare with levenshtein distance
- Calculate visual similarity
- Flag for user verification

3. User Input Validation

Warn users about similar domains:

// When user types URL
function checkVisualSimilarity(userInput) {
  const knownDomains = [
    'amazon.com', 'apple.com', 'google.com'
  ];

  for (let known of knownDomains) {
    const distance = levenshteinDistance(
      userInput.toLowerCase(),
      known
    );

    if (distance <= 2) {
      showWarning(
        `Did you mean ${known}? ` +
        `You typed something very similar.`
      );
    }
  }
}

Advanced Similarity Calculations

Visual Homoglyph Matrix

Create matrix of visually similar characters:

Matrix:
  l: [I, 1, |]
  o: [0, O, ο, О, о]
  a: [α, а, ɑ]
  e: [е, ё, ε]
  m: [rn, ɪ, м]
  n: [и, η, ո]

Similarity score:

  • If characters in same group: +0.8 similarity
  • Each character position: weight by importance
  • Calculate overall domain similarity

Contextual Similarity

Consider domain context:

def contextual_similarity(domain1, domain2):
  # Extract components
  name1, tld1 = domain1.rsplit('.', 1)
  name2, tld2 = domain2.rsplit('.', 1)

  # Different TLDs = lower similarity
  tld_similarity = 0.9 if tld1 == tld2 else 0.5

  # Name similarity
  name_distance = levenshtein_distance(name1, name2)
  name_similarity = 1.0 - (name_distance / max(len(name1), len(name2)))

  # Combined score
  return (name_similarity * 0.8) + (tld_similarity * 0.2)

Tools for Visual Similarity Analysis

Online Tools

  • Inventive HQ Domain Spoofing Detector
  • Phishable.com - Tests domain spoofing
  • Squatm0nkey - Domain typosquatting scanner
  • SecurityTrails - Domain similarity analysis

Command-Line Tools

# Check confusable characters
python3 -m idna example.com

# Whois check for registered variants
whois amazon.com
whois amаzon.com  # Cyrillic variant

# Unicode analysis
echo "amаzon.com" | od -An -tx1
# Shows byte representation

Programmatic Approaches

import unicodedata
from difflib import SequenceMatcher

def analyze_domain_similarity(suspicious, legitimate):
    # Unicode normalization
    sus_norm = unicodedata.normalize('NFKD', suspicious)
    leg_norm = unicodedata.normalize('NFKD', legitimate)

    # Basic similarity
    similarity = SequenceMatcher(None, sus_norm, leg_norm).ratio()

    # Check for confusable characters
    for char in suspicious:
        cat = unicodedata.category(char)
        if cat.startswith('L'):  # Letter category
            print(f"Character {char}: {cat}")

    return similarity

# Example
result = analyze_domain_similarity(
    "amаzon.com",  # Cyrillic 'a'
    "amazon.com"   # ASCII domain
)
print(f"Similarity: {result}")

Protective Measures Based on Similarity

1. Register Protective Variants

Preemptively register similar-looking domains:

Primary domain: amazon.com
Variants to protect:
- amаzon.com (Cyrillic 'a')
- аmazon.com (Cyrillic 'a' at start)
- amaz0n.com (zero instead of 'o')
- amazоn.com (Cyrillic 'o')
- am4zon.com (4 instead of 'a')

2. User Education

  • Teach users to check address bar carefully
  • Highlight domain in email clients
  • Verify domain when suspicious

3. Email Authentication

  • Implement DMARC strictly (p=reject)
  • Use DKIM for email signatures
  • Verify sender domain completely

4. Technical Detection

  • Monitor domain registrations
  • Set alerts for similar domain registration
  • Check certificate transparency logs
  • Monitor phishing databases

Real-World Homograph Examples

Cyrillic 'a' Attack (U+0430)

Original: ebay.com
Attack: еbay.com (Cyrillic е)
Looks identical to users
Completely different domain

Confused 'o' and '0'

Original: microsoft.com
Attack: micr0s0ft.com (zeros instead of o's)
Typosquatting variation
Users might not notice

Greek Letter Substitution

Original: paypal.com
Attack: payρal.com (Greek rho instead of 'p')
Extremely similar appearance
Easy to miss

Conclusion

Calculating visual similarity between domains enables detection and prevention of homograph attacks. By understanding similarity metrics, implementing detection systems, and registering protective variants, organizations can:

  • Identify spoofing attempts early
  • Protect users from phishing
  • Defend brand reputation
  • Prevent credential theft

Whether using simple string distance calculations or advanced unicode confusable detection, visual similarity analysis is essential for modern phishing prevention and domain security.

Don't wait for a breach to act

Get a free security assessment. Our experts will identify your vulnerabilities and create a protection plan tailored to your business.

Is USOClient.exe Safe? Windows Update Process Explained

Is USOClient.exe Safe? Windows Update Process Explained

Learn if USOClient.exe is safe or malware. How to verify it's legitimate, check digital signature, and understand what this Windows Update process does.

Lost Your Authenticator App? How to Recover Access and Prevent Future Lockouts

Lost Your Authenticator App? How to Recover Access and Prevent Future Lockouts

Lost your phone and can't access your accounts? Learn how to recover from authenticator app loss and set up cloud-synced backup strategies to prevent future lockouts.

Let's Encrypt Complete Guide: Free SSL/TLS Certificates with Certbot & ACME

Let's Encrypt Complete Guide: Free SSL/TLS Certificates with Certbot & ACME

Master Let's Encrypt with this comprehensive guide covering Certbot installation, HTTP-01 and DNS-01 challenges, wildcard certificates, automated renewal, DNS provider integrations, troubleshooting, and rate limits.

TLS Certificate Complete Guide: SSL/TLS Certificate Management for DevOps [2026]

TLS Certificate Complete Guide: SSL/TLS Certificate Management for DevOps [2026]

Master SSL/TLS certificate management with our comprehensive guide covering certificate types, lifecycle management, automation, security best practices, mTLS, OCSP stapling, and troubleshooting for modern infrastructure.

Wildcard vs SAN Certificates: Which SSL Certificate Type Do You Need?

Wildcard vs SAN Certificates: Which SSL Certificate Type Do You Need?

Compare wildcard and SAN (Subject Alternative Name) certificates to choose the right SSL/TLS certificate for your infrastructure. Understand security trade-offs, cost considerations, and use cases for each type.

TLS 1.3 vs TLS 1.2: Security Differences and Why You Should Upgrade

TLS 1.3 vs TLS 1.2: Security Differences and Why You Should Upgrade

Compare TLS 1.3 and TLS 1.2 security features, performance improvements, and cipher suite changes. Learn why TLS 1.3 is faster, more secure, and how to configure modern TLS on your servers.