Sitemap.xml for alloy blog entries

Now that I have a blog, I have been waiting eagerly for the google crawler. Some call me impatient. So I thought it would be good to have a secondary sitemap for google to speed things up a bit. I wrote a small PHP script for this and thought maybe other people might need it too. Simply create the file as sitemap_blog.php in the root directory of the website, change the parameters as needed, and then add it as an additional sitemap to Google Search Console.

Disclaimer 1: Attention, this is my own development and has nothing to do with Elixir. There is no support if you have problems to use it. You should have basic PHP knowledge and take care to change the variables in the beginning of the file to your situation.

Disclaimer 2: I’m actually not a PHP developer, hope the code is still reasonably clean.

<?php
header('Content-type: application/xml');     
// Output format as defined here: https://www.google.com/sitemaps/protocol.html

// Create this file into your root folder, e.g. https://www.example.com/sitemap_blog.php
// It will not work if you put it somewhere else!

// Change as needed:
// If your blog is located in example.com/blog this is what you need
// if it is example.com/myblog change it to "require_once "./myblog/files/spyc.php";" 
// Don't remove the dot in the beginning of the path
require_once "./blog/files/spyc.php";

// Again, change it to the real url you use for your blog:
$url_prefix       = 'https://www.example.de/blog/';  

// This is the "Posts folder" as defined it in alloy, change it to the name you used there!
$blog_files       = '/blog_files';

// Change frequency of a single blog entry (usually rarely changed after publishing)
// Don't change it, if you don't know what this is about
$change_frequency = 'monthly';                    

// That's it, this part should need no changes by you
// Read all files
$files = array_diff(scandir(__DIR__ . $blog_files), array('.', '..'));
 
// Scan for all non draft or future posts
foreach($files as $file) {
  $fileExt = pathinfo($file, PATHINFO_EXTENSION);
 
  if ($fileExt = "md") {

    // Get date of post
    $splitFilename = (explode("_",$file));
    $originalDate  = $splitFilename[0];
    $todaysDate    = date("Y-m-d");
 
    if (strtotime($originalDate) <= strtotime($todaysDate)) {
      // Blog entry has no future date
      // Read content now
      $fileContents = file_get_contents($file);
      $fileTime     = filemtime(__DIR__ . $blog_files . '/' . $file);
      $parts        = preg_split('/[\n]*[-]{3}[\n]/', $fileContents, 3);
      $postID       = (explode(".md",$splitFilename[1]));
		
      // Parse YAML part
      $frontMatter  = spyc_load_file($parts[1]);
 
      if ($frontMatter['draft'] != true) {
        // No draft, no future post, ready for output
        $sitemap_posts[$i]['url']           = $url_prefix . '?id=' .$postID[0];
        $sitemap_posts[$i]['last_modified'] = date ("Y-m-d", $fileTime);
 	
		$i++;
      }
    }
  } 
};
 
$output  = '<?xml version="1.0" encoding="UTF-8"?>' . "\n";
$output .= '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">' . "\n";
echo $output;

foreach($sitemap_posts as $sitemap_out) {
  echo '<url>' . "\n";
  echo '<loc>' . $sitemap_out['url'] . '</loc>' . "\n";
  echo '<lastmod>' . $sitemap_out['last_modified'] . '</lastmod>' . "\n";
  echo '<changefreq>' . $change_frequency . '</changefreq>' . "\n";
  echo '</url>' . "\n";
}
echo '</urlset>';
?>
2 Likes

This works a well, thanks.

I realise the OP doesn’t seem to be ariound now, but if he is and is reading this…

The only “issue” I can see is that the script doesn’t take into account friendly URL’s. I’m not sure if this will effect SEO or not, but if it can be added, it’d be nice :slight_smile:

even the “not” beautiful urls are found very well by google and get top rankings.

i run many blogs with alloy with pretty-url and without. there is no SEO effect between pretty-url and “not” pretty-urls.

but I still prefer alloy with “pretty-urls” because I can set a working redirect (301) here.

but btw: is it actually enough to just copy the generated rss.xml from alloy file into the sitemaps of the google search console …

the pages are usually in the index after 2 days