Archived document. This file has been superseded or completed and is kept for historical reference.

Non-Indexed Pages Investigation - sempers.com

Date: January 25-26, 2026 Investigator: Claude Code Analysis Scope: All indexing issues EXCLUDING Soft 404 and Duplicate Canonical (already addressed)


Executive Summary

This report analyzes the remaining indexing issues from Google Search Console. Most issues are intentional and correctly configured. A few items require minor attention.

Issue Pages Status Action Required
Page with redirect 40 INTENTIONAL None - www to non-www redirects
Alternate page with proper canonical 8 MOSTLY CORRECT Review 1 page
Excluded by ‘noindex’ tag 5 INTENTIONAL None
Blocked due to 401 1 FIXED robots.txt now blocks
Crawled - currently not indexed 4 MONITORING Consider content enhancement

1. Page with Redirect (40 pages)

Analysis

The 40 redirect pages are caused by the www to non-www redirect configured in .htaccess:

RewriteEngine on
RewriteCond %{HTTP_HOST} ^www.sempers.com [NC]
RewriteRule ^(.*)$ https://sempers.com/$1 [L,R=301]

Verification

$ curl -I https://www.sempers.com/
HTTP/1.1 301 Moved Permanently
Location: https://sempers.com/
HTTP/1.1 200 OK

Why 40 Pages?

The sitemap contains ~49 URLs. Google likely crawled some or all pages via both:

The 40 redirecting pages are the www versions.

Assessment

Aspect Status
Is this correct? YES
Does it hurt SEO? NO - 301 redirects properly consolidate authority
Action needed? NO

Recommendation

No action required. This is correct behavior. Google is simply reporting that it encountered 40 URLs that redirected. These are not errors - they are properly configured 301 redirects from www to non-www, which is the recommended approach for domain canonicalization.


2. Alternate Page with Proper Canonical Tag (8 pages)

Analysis

These are pages where the canonical tag points to a different URL. Google found this and correctly chose to index the canonical target instead.

Pages Identified

Based on source code analysis:

Page Canonical Points To Intentional?
discrimination-lawsuits/common-forms-of-workplace-discrimination.html filing-discrimination-claim-in-california.html LIKELY YES
(7 others - need GSC data to identify) Various Unknown

Deep Dive: common-forms-of-workplace-discrimination.html

File: /Users/zjs/engineering/sempers.com/public_html/discrimination-lawsuits/common-forms-of-workplace-discrimination.html

Current canonical tag:

<link href="../discrimination-lawsuits/filing-discrimination-claim-in-california.html" rel="canonical"/>

This page canonicalizes TO filing-discrimination-claim-in-california.html

Assessment:

Likely Additional Alternate Pages

Without Google Search Console access, the other 7 pages with proper canonical tags could include:

  1. Pagination pages (if any exist with rel=“canonical” to page 1)
  2. Parameter variations (URLs with tracking parameters)
  3. Multiple internal URLs pointing to same content
  4. Relative canonical resolution - some pages with relative canonicals may resolve differently than intended (this is covered in DUPLICATE_CANONICAL_INVESTIGATION.md)

Assessment

Aspect Status
Is common-forms-of-workplace-discrimination.html correct? LIKELY YES
Are the other 7 correct? UNKNOWN - need GSC data
Action needed? REVIEW in GSC

Recommendation

  1. Review in Google Search Console - Check which 8 specific pages are reported
  2. Verify intentionality - Confirm each alternate page should canonicalize to a different URL
  3. For common-forms-of-workplace-discrimination.html:
    • If intentional consolidation: No action needed
    • If it should be indexed separately: Change canonical to self-referencing absolute URL:
      <link href="https://sempers.com/discrimination-lawsuits/common-forms-of-workplace-discrimination.html" rel="canonical"/>
      

3. Excluded by ‘noindex’ Tag (5 pages)

Analysis

Searched entire public_html directory for noindex:

public_html/feedback/feedback.html
public_html/assets/guides/10-things-after-being-fired.html

Only 2 pages have noindex tags in the source code. The other 3 reported by Google may be:

Pages with noindex Tags

1. /feedback/feedback.html

<meta name="robots" content="noindex, nofollow">

Purpose: Client feedback collection page requiring a token Should it be noindex? YES - This is a private page for client feedback, not for public search

Additional protection: Now also blocked by robots.txt:

Disallow: /feedback/

2. /assets/guides/10-things-after-being-fired.html

<meta name="robots" content="noindex, nofollow">

Purpose: Downloadable guide/resource page Should it be noindex? YES - This appears to be a lead magnet or downloadable resource, not meant for organic search

Additional protection: Now also blocked by robots.txt:

Disallow: /assets/guides/

Other Possible noindex Pages

The 5 total could include pages that:

Assessment

Page Has noindex? Intentional? Blocked by robots.txt?
/feedback/feedback.html YES YES YES
/assets/guides/10-things-after-being-fired.html YES YES YES
/assets/includes/google-map.html NO Should have YES
Other 3 pages Unknown Unknown Need GSC data

Recommendation

  1. No action needed for existing noindex pages - They are correctly configured
  2. Add noindex to /assets/includes/google-map.html (optional - already blocked by robots.txt)
  3. Check GSC for the specific 5 pages to verify the remaining 3

4. Blocked Due to Unauthorized Request (401) - 1 page

Analysis

This is the /feedback/feedback.html page which:

Status

ALREADY FIXED - As documented in STATUS_OF_INDEXING_REPAIR.md:

Disallow: /feedback/

This was added to robots.txt on January 25, 2026.

Verification

The page is now:

  1. Blocked by robots.txt (Googlebot won’t crawl)
  2. Has noindex, nofollow meta tag (backup protection)
  3. Returns 200 OK but shows error content (this is now irrelevant since blocked)

Assessment

Aspect Status
Was this the 401 page? LIKELY
Is it now fixed? YES
Action needed? NO - wait for re-crawl

Recommendation

No additional action required. Google will remove this from the 401 list after re-crawling robots.txt (24-48 hours) and confirming the Disallow directive.


5. Crawled - Currently Not Indexed (4 pages)

Analysis

These are pages Google crawled but chose not to include in its index. Common causes:

Potential Candidates

Without GSC data, likely candidates based on site structure:

Page Type Examples Why Not Indexed?
Glossary pages 44 pages in /glossary/ May be perceived as thin content
Sitemap HTML page /sitemap/sitemap.html Utility page, not valuable content
Legal pages /disclaimers/, /privacy/ Common pages, low unique value
Blog pages Various May be thin or duplicate content

Glossary Page Analysis

The site has 44 glossary pages. Example content from /glossary/wrongful-termination.html:

Assessment: These could be considered thin content by Google. Each page has:

Other Thin Content Candidates

Page Issue
/sitemap/sitemap.html Just a list of links - utility page
/404error.html Error page - should not be indexed anyway
Google verification files google684d081a6b620264.html, googlede6683829bc41ed6.html

Assessment

Aspect Status
Should these be indexed? DEPENDS on page
Glossary pages Could benefit from expansion
Utility pages Should probably have noindex
Action needed? LOW PRIORITY

Recommendations

Short-term (Optional)

  1. Check GSC for the specific 4 pages
  2. Add noindex to utility pages that shouldn’t be indexed:
    • /404error.html (should not be indexed)
    • Google verification files (should not be indexed)

Long-term (Content Improvement)

For glossary pages to be indexed:

  1. Expand content to 500+ words each
  2. Add unique value - California-specific information, examples, case studies
  3. Internal linking - Link from practice area pages to relevant glossary terms
  4. FAQs - Add frequently asked questions to each glossary page

Summary of All Issues

Issue Count Root Cause Status Priority
Redirects 40 www to non-www 301s CORRECT None
Alternate canonical 8 Cross-page canonicals MOSTLY CORRECT LOW - Review in GSC
noindex 5 Intentional exclusion CORRECT None
401 Unauthorized 1 Token-gated feedback page FIXED None
Not indexed 4 Thin/utility content MONITORING LOW

Action Items

Immediate (No Action Required)

Review Recommended

Optional Improvements


Pages NOT in Sitemap (But Exist)

These pages exist on the site but are not in sitemap.xml:

Page In Sitemap? Should Be?
/case-evaluation/how-long-do-i-have-to-file.html NO Probably YES
/case-evaluation/how-much-is-my-case-worth.html NO Probably YES
/free-consultation/what-happens-when-i-call.html NO Probably YES
/glossary/* (44 pages) NO Debatable
/blog/* (some newer posts) PARTIAL Should update
/working-off-the-clock/clock-out-to-finish-work.html NO Probably YES
/disclaimers/terms-of-service.html NO Optional
/privacy/privacy-statement.html NO Optional
/404error.html NO NO (correct)
/feedback/feedback.html NO NO (correct)
/assets/* NO NO (correct)

Sitemap Recommendations

Consider adding to sitemap:

  1. /case-evaluation/how-long-do-i-have-to-file.html
  2. /case-evaluation/how-much-is-my-case-worth.html
  3. /free-consultation/what-happens-when-i-call.html
  4. Any newer blog posts not yet included

Appendix: File Inventory

Total HTML Files: 128

Category Count Notes
Main content pages ~50 Service/practice area pages
Blog posts ~15 In /blog/ directory
Glossary terms 44 In /glossary/ directory
Case evaluation 3 In /case-evaluation/
Legal/utility ~10 Privacy, terms, sitemap, 404
Assets/includes 3 Maps, guides, README
Verification files 2 Google verification

robots.txt Current Configuration

Sitemap: https://sempers.com/sitemap.xml
User-agent: *
Crawl-delay: 10
Disallow: /feedback/
Disallow: /assets/includes/
Disallow: /assets/guides/

Related Documents


Last updated: January 26, 2026