Archived document. This file has been superseded or completed and is kept for historical reference.
Non-Indexed Pages Investigation - sempers.com
Date: January 25-26, 2026 Investigator: Claude Code Analysis Scope: All indexing issues EXCLUDING Soft 404 and Duplicate Canonical (already addressed)
Executive Summary
This report analyzes the remaining indexing issues from Google Search Console. Most issues are intentional and correctly configured. A few items require minor attention.
| Issue | Pages | Status | Action Required |
|---|---|---|---|
| Page with redirect | 40 | INTENTIONAL | None - www to non-www redirects |
| Alternate page with proper canonical | 8 | MOSTLY CORRECT | Review 1 page |
| Excluded by ‘noindex’ tag | 5 | INTENTIONAL | None |
| Blocked due to 401 | 1 | FIXED | robots.txt now blocks |
| Crawled - currently not indexed | 4 | MONITORING | Consider content enhancement |
1. Page with Redirect (40 pages)
Analysis
The 40 redirect pages are caused by the www to non-www redirect configured in .htaccess:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^www.sempers.com [NC]
RewriteRule ^(.*)$ https://sempers.com/$1 [L,R=301]
Verification
$ curl -I https://www.sempers.com/
HTTP/1.1 301 Moved Permanently
Location: https://sempers.com/
HTTP/1.1 200 OK
Why 40 Pages?
The sitemap contains ~49 URLs. Google likely crawled some or all pages via both:
https://www.sempers.com/page.html(redirects)https://sempers.com/page.html(canonical)
The 40 redirecting pages are the www versions.
Assessment
| Aspect | Status |
|---|---|
| Is this correct? | YES |
| Does it hurt SEO? | NO - 301 redirects properly consolidate authority |
| Action needed? | NO |
Recommendation
No action required. This is correct behavior. Google is simply reporting that it encountered 40 URLs that redirected. These are not errors - they are properly configured 301 redirects from www to non-www, which is the recommended approach for domain canonicalization.
2. Alternate Page with Proper Canonical Tag (8 pages)
Analysis
These are pages where the canonical tag points to a different URL. Google found this and correctly chose to index the canonical target instead.
Pages Identified
Based on source code analysis:
| Page | Canonical Points To | Intentional? |
|---|---|---|
discrimination-lawsuits/common-forms-of-workplace-discrimination.html |
filing-discrimination-claim-in-california.html |
LIKELY YES |
| (7 others - need GSC data to identify) | Various | Unknown |
Deep Dive: common-forms-of-workplace-discrimination.html
File: /Users/zjs/engineering/sempers.com/public_html/discrimination-lawsuits/common-forms-of-workplace-discrimination.html
Current canonical tag:
<link href="../discrimination-lawsuits/filing-discrimination-claim-in-california.html" rel="canonical"/>
This page canonicalizes TO filing-discrimination-claim-in-california.html
Assessment:
- The page title is “Common Forms Of | Workplace Discrimination”
- The canonical target is the main discrimination filing page
- This appears to be intentional content consolidation - directing search engines to treat the main page as the authoritative version
Likely Additional Alternate Pages
Without Google Search Console access, the other 7 pages with proper canonical tags could include:
- Pagination pages (if any exist with rel=“canonical” to page 1)
- Parameter variations (URLs with tracking parameters)
- Multiple internal URLs pointing to same content
- Relative canonical resolution - some pages with relative canonicals may resolve differently than intended (this is covered in DUPLICATE_CANONICAL_INVESTIGATION.md)
Assessment
| Aspect | Status |
|---|---|
Is common-forms-of-workplace-discrimination.html correct? |
LIKELY YES |
| Are the other 7 correct? | UNKNOWN - need GSC data |
| Action needed? | REVIEW in GSC |
Recommendation
- Review in Google Search Console - Check which 8 specific pages are reported
- Verify intentionality - Confirm each alternate page should canonicalize to a different URL
- For
common-forms-of-workplace-discrimination.html:- If intentional consolidation: No action needed
- If it should be indexed separately: Change canonical to self-referencing absolute URL:
<link href="https://sempers.com/discrimination-lawsuits/common-forms-of-workplace-discrimination.html" rel="canonical"/>
3. Excluded by ‘noindex’ Tag (5 pages)
Analysis
Searched entire public_html directory for noindex:
public_html/feedback/feedback.html
public_html/assets/guides/10-things-after-being-fired.html
Only 2 pages have noindex tags in the source code. The other 3 reported by Google may be:
- Cached versions that have since been removed
- Pages that no longer exist (Google’s cache is stale)
- Dynamically generated pages
Pages with noindex Tags
1. /feedback/feedback.html
<meta name="robots" content="noindex, nofollow">
Purpose: Client feedback collection page requiring a token Should it be noindex? YES - This is a private page for client feedback, not for public search
Additional protection: Now also blocked by robots.txt:
Disallow: /feedback/
2. /assets/guides/10-things-after-being-fired.html
<meta name="robots" content="noindex, nofollow">
Purpose: Downloadable guide/resource page Should it be noindex? YES - This appears to be a lead magnet or downloadable resource, not meant for organic search
Additional protection: Now also blocked by robots.txt:
Disallow: /assets/guides/
Other Possible noindex Pages
The 5 total could include pages that:
- Google cached before noindex was added
- Were deleted but still in Google’s memory
- Have noindex set via HTTP headers (not visible in source)
Assessment
| Page | Has noindex? | Intentional? | Blocked by robots.txt? |
|---|---|---|---|
/feedback/feedback.html |
YES | YES | YES |
/assets/guides/10-things-after-being-fired.html |
YES | YES | YES |
/assets/includes/google-map.html |
NO | Should have | YES |
| Other 3 pages | Unknown | Unknown | Need GSC data |
Recommendation
- No action needed for existing noindex pages - They are correctly configured
- Add noindex to
/assets/includes/google-map.html(optional - already blocked by robots.txt) - Check GSC for the specific 5 pages to verify the remaining 3
4. Blocked Due to Unauthorized Request (401) - 1 page
Analysis
This is the /feedback/feedback.html page which:
- Requires a valid token parameter (
?t=...) - Shows an error state when accessed without a valid token
- Returns 200 OK status but displays “Invalid Link” message
Status
ALREADY FIXED - As documented in STATUS_OF_INDEXING_REPAIR.md:
Disallow: /feedback/
This was added to robots.txt on January 25, 2026.
Verification
The page is now:
- Blocked by robots.txt (Googlebot won’t crawl)
- Has
noindex, nofollowmeta tag (backup protection) - Returns 200 OK but shows error content (this is now irrelevant since blocked)
Assessment
| Aspect | Status |
|---|---|
| Was this the 401 page? | LIKELY |
| Is it now fixed? | YES |
| Action needed? | NO - wait for re-crawl |
Recommendation
No additional action required. Google will remove this from the 401 list after re-crawling robots.txt (24-48 hours) and confirming the Disallow directive.
5. Crawled - Currently Not Indexed (4 pages)
Analysis
These are pages Google crawled but chose not to include in its index. Common causes:
- Thin content
- Low perceived value
- Duplicate/similar content elsewhere
- Low internal/external links
Potential Candidates
Without GSC data, likely candidates based on site structure:
| Page Type | Examples | Why Not Indexed? |
|---|---|---|
| Glossary pages | 44 pages in /glossary/ |
May be perceived as thin content |
| Sitemap HTML page | /sitemap/sitemap.html |
Utility page, not valuable content |
| Legal pages | /disclaimers/, /privacy/ |
Common pages, low unique value |
| Blog pages | Various | May be thin or duplicate content |
Glossary Page Analysis
The site has 44 glossary pages. Example content from /glossary/wrongful-termination.html:
- Page length: ~350 lines (mostly boilerplate)
- Unique content: ~200 words of definition
- Has contact form
- Has navigation
Assessment: These could be considered thin content by Google. Each page has:
- A short definition (1-2 paragraphs)
- Standard boilerplate (header, footer, contact form)
- No unique images or rich media
- Minimal internal linking to/from
Other Thin Content Candidates
| Page | Issue |
|---|---|
/sitemap/sitemap.html |
Just a list of links - utility page |
/404error.html |
Error page - should not be indexed anyway |
| Google verification files | google684d081a6b620264.html, googlede6683829bc41ed6.html |
Assessment
| Aspect | Status |
|---|---|
| Should these be indexed? | DEPENDS on page |
| Glossary pages | Could benefit from expansion |
| Utility pages | Should probably have noindex |
| Action needed? | LOW PRIORITY |
Recommendations
Short-term (Optional)
- Check GSC for the specific 4 pages
- Add noindex to utility pages that shouldn’t be indexed:
/404error.html(should not be indexed)- Google verification files (should not be indexed)
Long-term (Content Improvement)
For glossary pages to be indexed:
- Expand content to 500+ words each
- Add unique value - California-specific information, examples, case studies
- Internal linking - Link from practice area pages to relevant glossary terms
- FAQs - Add frequently asked questions to each glossary page
Summary of All Issues
| Issue | Count | Root Cause | Status | Priority |
|---|---|---|---|---|
| Redirects | 40 | www to non-www 301s | CORRECT | None |
| Alternate canonical | 8 | Cross-page canonicals | MOSTLY CORRECT | LOW - Review in GSC |
| noindex | 5 | Intentional exclusion | CORRECT | None |
| 401 Unauthorized | 1 | Token-gated feedback page | FIXED | None |
| Not indexed | 4 | Thin/utility content | MONITORING | LOW |
Action Items
Immediate (No Action Required)
- [x] Redirects (40) - Correct configuration
- [x] 401 page (1) - Fixed via robots.txt
- [x] noindex pages (2 found) - Intentionally excluded
Review Recommended
- [ ] Check GSC for 8 alternate canonical pages - Verify each is intentional
- [ ] Check GSC for 4 not-indexed pages - Identify which specific pages
Optional Improvements
- [ ] Add noindex to
/404error.html - [ ] Add noindex to Google verification HTML files
- [ ] Consider expanding glossary page content (long-term)
Pages NOT in Sitemap (But Exist)
These pages exist on the site but are not in sitemap.xml:
| Page | In Sitemap? | Should Be? |
|---|---|---|
/case-evaluation/how-long-do-i-have-to-file.html |
NO | Probably YES |
/case-evaluation/how-much-is-my-case-worth.html |
NO | Probably YES |
/free-consultation/what-happens-when-i-call.html |
NO | Probably YES |
/glossary/* (44 pages) |
NO | Debatable |
/blog/* (some newer posts) |
PARTIAL | Should update |
/working-off-the-clock/clock-out-to-finish-work.html |
NO | Probably YES |
/disclaimers/terms-of-service.html |
NO | Optional |
/privacy/privacy-statement.html |
NO | Optional |
/404error.html |
NO | NO (correct) |
/feedback/feedback.html |
NO | NO (correct) |
/assets/* |
NO | NO (correct) |
Sitemap Recommendations
Consider adding to sitemap:
/case-evaluation/how-long-do-i-have-to-file.html/case-evaluation/how-much-is-my-case-worth.html/free-consultation/what-happens-when-i-call.html- Any newer blog posts not yet included
Appendix: File Inventory
Total HTML Files: 128
| Category | Count | Notes |
|---|---|---|
| Main content pages | ~50 | Service/practice area pages |
| Blog posts | ~15 | In /blog/ directory |
| Glossary terms | 44 | In /glossary/ directory |
| Case evaluation | 3 | In /case-evaluation/ |
| Legal/utility | ~10 | Privacy, terms, sitemap, 404 |
| Assets/includes | 3 | Maps, guides, README |
| Verification files | 2 | Google verification |
robots.txt Current Configuration
Sitemap: https://sempers.com/sitemap.xml
User-agent: *
Crawl-delay: 10
Disallow: /feedback/
Disallow: /assets/includes/
Disallow: /assets/guides/
Related Documents
STATUS_OF_INDEXING_REPAIR.md- Soft 404 and 401 fixesDUPLICATE_CANONICAL_INVESTIGATION.md- Relative canonical URL issuesNGINX_404_INVESTIGATION.md- 404 handling investigation
Last updated: January 26, 2026