How to use the seo spider for broken link building

Web Scraping & Data Extraction Using The SEO Spider Tool

This tutorial walks you through how you can use the Screaming Frog SEO Spider’s custom extraction feature, to scrape data from websites.

The custom extraction feature allows you to scrape any data from the HTML of a web page using CSSPath, XPath and regex. The extraction is performed on the static HTML returned from URLs crawled by the SEO Spider, which return a 200 ‘OK’ response. You can switch to mode to extract data from the rendered HTML.

To jump to examples click one of the below links:

To get started, you’ll need to download & install the SEO Spider software and have a licence to access the custom extraction feature necessary for scraping. You can download via the buttons in the right hand side bar.

When you have the SEO Spider open, the next steps to start extracting data are as follows –

Small Update – Version 13.1 Released 15th July 2020

We have just released a small update to version 13.1 of the SEO Spider. This release is mainly bug fixes and small improvements –

  • We’ve introduced two new reports for Google Rich Result features discovered in a crawl under ‘Reports > Structured Data’. There’s a summary of features and number of URLs they affect, and a granular export of every rich result feature detected.
  • Fix issue preventing start-up running on macOS Big Sur Beta
  • Fix issue with users unable to open .dmg on macOS Sierra (10.12).
  • Fix issue with Windows users not being able to run when they have Java 8 installed.
  • Fix TLS handshake issue connecting to some GoDaddy websites using Windows.
  • Fix crash in PSI.
  • Fix crash exporting the Overview Report.
  • Fix scaling issues on Windows using multiple monitors, different scaling factors etc.
  • Fix encoding issues around URLs with Arabic text.
  • Fix issue when amending the ‘Content Area’.
  • Fix several crashes running Spelling & Grammar.
  • Fix several issues around custom extraction and XPaths.
  • Fix sitemap export display issue using Turkish locale.

2) Поиск некачественных title и description

Уникальные метаданные важны для SEO. Ключевые слова должны быть правильно употреблены в теге title и мета-описании. Хороший заголовок должен иметь длину не более 60 символов, а ключевое слово должно использоваться в его начале. Для мета-описания рекомендуется длина 160 символов. SEO Frog Spider работает так же, как профессионалы поисковой оптимизации. Инструмент исправляет тайтлы и описания, которые слишком длинны и неприемлемы для поисковой системы. Он отдельно отображает результаты по заголовкам: URL-адрес, вхождения, длина и содержимое. Затем вы сможете устранить выявленные ошибки и внести необходимые исправления.

Как спарсить карту сайта в SEO Screaming Frog: руководство

Задача сеошника — проанализировать Sitemap на наличие ошибок и оптимизировать под поисковые системы, чтобы в карте сайте были только качественные веб-страницы, открыты для поиска. И именно эти страницы потом должны появляться в результатах поисковой выдачи (SERP). Особенностью программы SEO Screaming Frog является то, что она позволит просканировать даже все внутренние xml-файлы, чего иногда не удается сделать в разных онлайн-сервисах.

Итак, теперь перейдем к руководству.

1. Запустите программу Screaming Frog и перейдите в меню Confuguration -> Spider. Затем на вкладке Crawl перейдите вниз к разделу XML Sitemaps и поставьте чекбокс «Crawl Linked XML Sitemaps» и «Crawl These Sitemaps» («Auto Discover XML Sitemap» — отключаем). Станет доступным текстовое поле, вставьте в него URL-адрес Sitemap карты сайта и нажмите ОК:

2. Вставьте URL-адрес основного домена в поле для поиска «Enter URL to spider» и нажмите «Start» — запустится автоматическое сканирование:

3. Теперь перейдите в меню Crawl Analysis и выберите пункт Configure. В открывшемся окне оставьте включенным только чекбокс «Sitemaps». Нажмите ОК, чтобы сохранить изменения и перейдите еще раз в Crawl Amalysis и нажмите Start — программа Screaming Frog выполнит аудит XML-карты сайта:

Также, спарсить XML-карту сайта можно более простым способом: отдельно через меню Mode -> List и потом нажать на кнопку Upload -> Download XML Sitemap. В появившимся окне ввести URL карты сайта и нажать ОК.

Screaming Frog покажет сколько спарсил веб-страниц, нажмите ОК:

Small Update – Version 13.2 Released 4th August 2020

We have just released a small update to version 13.2 of the SEO Spider. This release is mainly bug fixes and small improvements –

  • We first released custom search back in 2011 and it was in need of an upgrade. So we’ve updated functionality to allow you to search within specific elements, entire tracking tags and more. Check out our custom search tutorial.
  • Sped up near duplicate crawl analysis.
  • Google Rich Results Features Summary export has been ordered by number of URLs.
  • Fix bug with Near Duplicates Filter not being populated when importing a .seospider crawl.
  • Fix several crashes in the UI.
  • Fix PSI CrUX data incorrectly labelled as sec.
  • Fix spell checker incorrectly checking some script content.
  • Fix crash showing near duplicates details panel.
  • Fix issue preventing users with dual stack networks to crawl on windows.
  • Fix crash using Wacom tablet on Windows 10.
  • Fix spellchecker filters missing when reloading a crawl.
  • Fix crash on macOS around multiple screens.
  • Fix crash viewing gif in the image details tab.
  • Fix crash canceling during database crawl load.

How Does SEO Work?

Google (and Bing, which also power Yahoo search results) score their search results largely based upon relevancy and authority of pages it has crawled and included in its web index, to a users query to provide the best answer.

Google uses over 200 signals in scoring their search results and SEO encompasses technical and creative activities to influence and improve some of those known signals. It’s often useful to not focus too much on individual ranking signals and look at the wider goal of Google, to provide the best answers for its users.

SEO, therefore, involves making sure a website is accessible, technically sound, uses words that people type into the search engines, and provides an excellent user experience, with useful and high quality, expert content that helps answers the user’s query.

Google has a very large team of search quality raters that evaluate the quality of search results, that gets fed into a machine learning algorithm. Google’s search quality rater guidelines provide plenty of detail and examples of what Google class as high or low quality content and websites, and their emphasis on wanting to reward sites that clearly show their expertise, authority and trust (EAT).

Google uses a hyperlink based algorithm (known as ‘PageRank’) to calculate the popularity and authority of a page, and while Google is far more sophisticated today, this is still a fundamental signal in ranking. SEO can therefore also include activity to help improve the number and quality of ‘inbound links’ to a website, from other websites. This activity has historically been known as ‘link building’, but is really just marketing a brand with an emphasis online, through content or digital PR for example.

Relevant and reputable websites linking to a website is a strong signal to Google that it might be of interest to its users, and can be trusted to appear in the search results for relevant queries.

1) Dark Mode

While arguably not the most significant feature in this release, it is used throughout the screenshots – so it makes sense to talk about first. You can now switch to dark mode, via ‘Config > User Interface > Theme > Dark’.

Not only will this help reduce eye strain for those that work in low light (everyone living in the UK right now), it also looks super cool – and is speculated (by me now) to increase your technical SEO skills significantly.

The non-eye-strained among you may notice we’ve also tweaked some other styling elements and graphs, such as those in the right-hand overview and site structure tabs.

2) Small Tweaks

We also made a few smaller updates which include –

  • A new configuration for ‘User Interface’ which allows graphs to be enabled and disabled. There are performance issues on . A bug has been raised with Oracle and we are pressing for a fix. In the mean time users affected can work around this by disabling graphs or using . A restart is required for this to take affect.
  • We have also introduced a warning for affected Mac users on start up (and in their UI settings) that they can either disable graphs or open in low resolution mode to improve performance.
  • Mac memory allocation settings can now persist when the app is reinstalled rather than be overwritten. There is a new way of configuring memory settings detailed in our .
  • We have further optimised graphs to only update when visible.
  • We re-worded the spider authentication pop up, which often confused users who thought it was an SEO Spider login!
  • We introduced a new pop-up message for memory related crashes.

7) Web Forms Authentication (Crawl Behind A Login)

The SEO Spider has standards-based authentication for some time, which enables users to crawl staging and development sites. However, there are other web forms and areas which require you to log in with cookies which have been inaccessible, until now.

We have introduced a new ‘authentication’ configuration (under ‘Configuration > Authentication), which allows users to log in to any web form within the SEO Spider Chromium browser, and then crawl it.

This means virtually all password-protected areas, intranets and anything which requires a web form login can now be crawled.

Please note – This feature is extremely powerful and often areas behind logins will contain links to actions which a user doesn’t want to press (for example ‘delete’). The SEO Spider will obviously crawl every link, so please use responsibly, and not on your precious fantasy football team. With great power comes great responsibility(!).

You can block the SEO Spider from crawling links or areas by using the or .

3) GA & GSC Not Matched Report

The ‘GA Not Matched’ report has been replaced with the new ‘GA & GSC Not Matched Report’ which now provides consolidated information on URLs discovered via the Google Search Analytics API, as well as the Google Analytics API, but were not found in the crawl.

This report can be found under ‘reports’ in the top level menu and will only populate when you have connected to an API and the crawl has finished.

There’s a new ‘source’ column next to each URL, which details the API(s) it was discovered (sometimes this can be both GA and GSC), but not found to match any URLs found within the crawl.

You can see in the example screenshot above from our own website, that there are some URLs with mistakes, a few orphan pages and URLs with hash fragments, which can show as quick links within meta descriptions (and hence why their source is GSC rather than GA).

I discussed how this data can be used in more detail within the and it’s a real hidden gem, as it can help identify orphan pages, other errors, as well as just matching problems between the crawl and API(s) to investigate.

2) Google Sheets Export

You’re now able to export directly to Google Sheets.

You can add multiple Google accounts and connect to any, quickly, to save your crawl data which will appear in Google Drive within a ‘Screaming Frog SEO Spider’ folder, and be accessible via Sheets.

Many of you will already be aware that Google Sheets isn’t really built for scale and has a 5m cell limit. This sounds like a lot, but when you have 55 columns by default in the Internal tab (which can easily triple depending on your config), it means you can only export around 90k rows (55 x 90,000 = 4,950,000 cells).

If you need to export more, use a different export format that’s built for the size (or reduce your number of columns). We had started work on writing to multiple sheets, but really, Sheets shouldn’t be used in that way.

This has also been integrated into and the . This means you can schedule a crawl, which automatically exports any tabs, filters, exports or reports to a Sheet within Google Drive.

You’re able to choose to create a timestamped folder in Google Drive, or overwrite an existing file.

This should be helpful when sharing data in teams, with clients, or for Google Data Studio reporting.

1) PageSpeed Insights Integration – Lighthouse Metrics, Opportunities & CrUX Data

You’re now able to gain valuable insights about page speed during a crawl. We’ve introduced a new ‘PageSpeed’ tab and integrated the PSI API which uses Lighthouse, and allows you to pull in Chrome User Experience Report (CrUX) data and Lighthouse metrics, as well as analyse speed opportunities and diagnostics at scale.

The field data from CrUX is super useful for capturing real-world user performance, while Lighthouse lab data is excellent for debugging speed related issues and exploring the opportunities available. The great thing about the API is that you don’t need to use JavaScript rendering, all the heavy lifting is done off box.

You’re able to choose and configure over 75 metrics, opportunities and diagnostics (under ‘Config > API Access > PageSpeed Insights > Metrics’) to help analyse and make smarter decisions related to page speed.

(The irony of releasing pagespeed auditing, and then including a gif in the blog post.)

In the , you’re able to view metrics such as performance score, TTFB, first contentful paint, speed index, time to interactive, as well as total requests, page size, counts for resources and potential savings in size and time – and much, much more.

There are 19 filters for opportunities and diagnostics to help identify potential speed improvements from Lighthouse.

Click on a URL in the top window and then the ‘PageSpeed Details’ tab at the bottom, the lower window populates with metrics for that URL, and orders opportunities by those that will make the most impact at page level based upon Lighthouse savings.

By clicking on an opportunity in the lower left-hand window panel, the right-hand window panel then displays more information on the issue, such as the specific resources with potential savings.

As you would expect, all of the data can be exported in bulk via ‘‘ in the top-level menu.

There’s also a very cool ‘PageSpeed Opportunities Summary’ report, which summaries all the opportunities discovered across the site, the number of URLs it affects, and the average and total potential saving in size and milliseconds to help prioritise them at scale, too.

As well as bulk exports for each opportunity, there’s a CSS coverage report which highlights how much of each CSS file is unused across a crawl and the potential savings.

Please note, using the PageSpeed Insights API (like the interface) can affect analytics currently. Google are aware of the issue and we have included an to prevent it from inflating analytics data.

5) Internal Link Score

A useful way to evaluate and improve internal linking is to calculate internal PageRank of URLs, to help get a clearer understanding about which pages might be seen as more authoritative by the search engines.

The SEO Spider already reports on a number of useful metrics to analyse internal linking, such as crawl depth, the number of inlinks and outlinks, the number of unique inlinks and outlinks, and the percentage of overall URLs that link to a particular URL. To aid this further, we have now introduced an advanced ‘link score’ metric, which calculates the relative value of a page based upon its internal links.

This uses a relative 0-100 point scale from least to most value for simplicity, which allows you to determine where internal linking might be improved.

The link score metric algorithm takes into consideration redirects, canonicals, nofollow and much more, which we will go into more detail in another post.

This is a relative mathematical calculation, which can only be performed at the end of a crawl when all URLs are known. Previously, every calculation within the SEO Spider has been performed at run-time during a crawl, which leads us on to the next feature.

3) Crawl Overview Right Hand Window Pane

We received a lot of positive response to our when it was released last year. However, we felt that it was a little hidden away, so we have introduced a new right hand window which includes the crawl overview report as default. This overview pane updates alongside the crawl, which means you can see which tabs and filters are populated at a glance during the crawl and their respective percentages.

This means you don’t need to click on the tabs and filters to uncover issues, you can just browse and click on these directly as they arise. The ‘Site structure’ tab provides more detail on the depth and most linked to pages without needing to export the ‘crawl overview’ report or sort the data. The ‘response times’ tab provides a quick overview of response time from the SEO Spider requests. This new window pane will be updated further in the next few weeks.

You can choose to hide this window, if you prefer the older format.

Small Update – Version 14.2 Released 16th February 2021

We have just released a small update to version 14.2 of the SEO Spider. This release includes a couple of cool new features, alongside lots of small bug fixes.

Core Web Vitals Assessment

We’ve introduced a ‘Core Web Vitals Assessment’ column in the PageSpeed tab with a ‘Pass’ or ‘Fail’ using field data collected via the PageSpeed Insights API for Largest Contentful Paint, First Input Delay and Cumulative Layout Shift.

For a page to ‘pass’ the Core Web Vital Assessment it must be considered ‘Good’ in all three metrics, based upon Google’s various thresholds for each. If there’s no data for the URL, then this will be left blank.

This should help identify problematic sections and URLs more efficiently. Please see our tutorial on How To Audit Core Web Vitals.

Broken Bookmarks (or ‘Jump Links’)

Bookmarks are a useful way to link users to a specific part of a webpage using named anchors on a link, also referred to as ‘jump links’ or ‘anchor links’. However, they frequently become broken over time – even for Googlers.

To help with this problem, there’s now a check in the SEO Spider which crawls URLs with fragment identifiers and verifies that an accurate ID exists within the HTML of the page for the bookmark.

You can enable ‘Crawl Fragment Identifiers’ under ‘Config > Spider > Advanced’, and then view any broken bookmarks under the URL tab and new ‘Broken Bookmark’ filter.

You can view the source pages these are on by using the ‘inlinks’ tab, and export in bulk via a right click ‘Export > Inlinks’. Please see our tutorial on How To Find Broken Bookmark & Jump Links.

14.2 also includes the following smaller updates and bug fixes.

  • Improve labeling in all HTTP headers report.
  • Update some column names to make more consistent – For those that have scripts that work from column naming, these include – Using capital case for ‘Length’ in h1 and h2 columns, and pluralising ‘Meta Keywords’ columns from singular to match the tab.
  • Update link score graph calculations to exclude self referencing links via canoncials and redirects.
  • Make srcset attributes parsing more robust.
  • Update misleading message in visualisations around respecting canonicals.
  • Treat HTTP response headers as case insensitive.
  • Relax Job Posting value property type checking.
  • Fix issue where right click ‘Export > Inlinks’ sometimes doesn’t export all the links.
  • Fix freeze on M1 mac during crawl.
  • Fix issue with Burmese text not displayed correctly.
  • Fix issue where Hebrew text can’t be input into text fields.
  • Fix issue with ‘Visualisations > Inlink Achor Text Word Cloud’ opening two windows.
  • Fix issue with Forms Based Auth unlock icon not displaying.
  • Fix issue with Forms Based Auth failing for sites with invalid certificates.
  • Fix issue with Overview Report showing incorrect site URL in some situations.
  • Fix issue with Chromium asking for webcam access.
  • Fix issue on macOS where launching via a .seospider/.dbseospider file doesn’t always load the file.
  • Fix issue with Image Preview incorrectly showing 404.
  • Fix issue with PSI CrUX data being duplicated in Origin.
  • Fix various crashes in JavaScript crawling.
  • Fix crash parsing some formats of HTML.
  • Fix crash when re-spidering.
  • Fix crash performing JavaScript crawl with empty user agent.
  • Fix crash selecting URL in master view when all tabs in details view are disabled/hidden.
  • Fix crash in JavaScript crawling when web server sends invalid UTF-8 characters.
  • Fix crash in Overview tab.

Troubleshooting

If set up correctly, this process should be seamless but occasionally Google might catch wind of what you’re up too and come down to stop your fun with an annoying anti-bot captcha test.

If this happens just pause your crawl, load up a PSI page in a browser to solve the captcha, then jump back in the tool highlight the URLs that did not extract any data right click > Re-Spider.

If this continues the likelihood is you have your crawl speed set too high, if you lower it down a bit in the options mentioned above it should put you back on track.

I’ve also noticed a number of comments reporting the PSI page not properly rendering and nothing being extracted. If this happens it might be worth a clear to the default config (File > Configuration > Clear to default). Next, make sure the user-agent is set to ScreamingFrog. Finally, ensure you have the following configuration options ticked (Configuration > Spider):

  • Check Images
  • Check CSS
  • Check JavaScript
  • Check SWF
  • Check External Links

If for any reason, the page is rendering correctly but some scores weren’t extracted,  double check the Xpaths have been entered correctly and the dropdown is changed to ‘Extract Text’. Secondly, it’s worth checking PSI actually has that data by loading it in a browser — much of the real-world data is only available on high-volume pages.

Полное описание

Скриминг Фрог работает по принципу так называемого «паука», выполняет проверку веб-ресурса и сбор информации о его содержимом. После этого, оптимизатор сможет анализировать полученные данные, проводить быстрый аудит сайта, проверять страницы на предмет критических ошибок и так далее.

Рассмотрим перечень основных опций, встроенных в софт.

  • Ищет битые страницы и редиректы;
  • Работает через командную строку;
  • Возможность извлекать данные при помощи XPath;
  • Поддерживает Proxy;
  • Умеет парсить все поддомены и внутренние ссылки по расписанию;
  • Выгрузка всех картинок, удаление ненужных папок;
  • Можно отфильтровать каждый столбец и колонку в таблице;
  • Показывает недостающие ключевые слова, необходимые для оптимизации;
  • Отображение анкоров, а также документов, на которых присутствуют урлы к этим страницам;
  • Позволяет найти конкретные странички с дублирующими заголовками и метатегами Description;
  • Может находить изображения с отсутствующими и длинными атрибутами alt и title тега img;
  • Выводит сведения по meta тэгам, которые управляют поисковыми ботами (краулерами);
  • Возможность указывать размеры заголовков в символах и пикселях;
  • Генерирует карту сайта в файл sitemap xml с множеством дополнительных настроек;
  • Анализирует скорость и время загрузки web-страниц;
  • Проверка массового перенаправления, сканер URL-адресов для переадресации;
  • Настройка максимального размера страницы для обхода (в килобайтах);
  • Наличие вкладки In Links, в ней можно посмотреть список страниц, которые ссылаются на указанный URL;
  • Работа с конфигурациями Robots.txt (итоговый вариант будет считаться каноничным для парсера).

Представленные выше функции, являются лишь небольшой частью интегрированного инструментария. Примечательно, что программное обеспечение SF API Access поддерживает интеграцию с разными статистическими сервисами, включая Google Analytics или Majestic, благодаря чему вы сможете увидеть и просмотреть еще больше всевозможных данных и параметров.

3) Bug Fixes

Version 2.55 includes the following bug fixes –

  • Fixed a crash with invalid regex entered into the exclude feature.
  • Fixed a bug introduced in 2.50 where starting up in list mode, then moving to crawl mode left the crawl depth at 0.
  • Fixed a minor UI issue with the default config menu allowing you to clear default configuration during a crawl. It’s now greyed out at the top level to be consistent with the rest of the File menu.
  • Fixed various window size issues on Macs.
  • Detect buggy saved coordinates for version 2.5 users on the mac so they get full screen on start up.
  • Fixed a couple of other crashes.
  • Fixed a typo in the crash warning pop up. Oops!

That’s everything now for this small release, there’s lots more to come in the next update.

Thanks for all your support.

Как работать со Screaming frog SEO spider

Парсинг страниц с помощью Spider SEO не занимает много времени и усилий. После запуска программы ее необходимо совсем немного настроить под ваши требования.

Первое, что нам нужно сделать, – настроить режим парсинга (вкладка «Mode»). В зависимости от ваших потребностей, можно выбрать один из доступных режимов работы программы.

  • Spider – парсинг сайта.
  • List – парсинг указанных URL адресов.
  • SERP – проверка Title и Description страниц. Расчет количества знаков, ширины и длины в пикселях, прежде чем метатеги попадут на сайт.

Режим Spider

В этом режиме происходит парсинг всего сайта полностью, в том числе парсинг изображений. Здесь все просто – вставляем URL сайта в адресную строку и нажимаем кнопку «Start»

Режим List

В этом режиме программа будет парсить только те URL адреса, которые вы зададите.

Список адресов можно добавить в Screaming frog несколькими способами. Для этого делам следующее:

  1. Выбираем режим «List»
  2. Нажимаем кнопку «Upload List» и в выпадающем меню выбираем способ добавления URL адресов:
    • Загрузить файл со списком URL адресов кнопкой «From a File…»
    • Ввести вручную, выбрав пункт «Enter Manually…»
    • Скопировать URL адреса в буфер обмена и вставить их в программу кнопкой «Past»

После того как Screaming frog SEO spider спарсит ваш сайт или заданные URL адреса, в главном окне программы появится отчет с адресами и информацией о них.

Отчет сформирован так, что каждая строка – это отдельная страница сайта или же просто ссылка, а столбцы – характеристики.

Переходим к основным вкладкам программы, которые расположены в верхней части главного окна нашей программы. В каждой вкладке есть свои таблицы с URL адресами и фильтрами по характеристикам.

Подробнее о вкладках и о том, за что они отвечают:

  • Internal – обычно она открыта по умолчанию и отображает основные собранные данные по URL адресам, в том числе ответ сервера. В этой вкладке отображено больше всего параметров.
  • External – здесь отображаются исходящие ссылки.
  • Response Codes – вкладка, которая отображает заголовки HTTP страниц.
  • URL – здесь отображаются проблемные URL адреса. Изначально мы видим все URL, которые программа спарсила; чтобы посмотреть проблемные ссылки, необходимо выбрать тип проблемы в фильтре.
  • Page Titles – вкладка, где можно отследить страницы с проблемными заголовками. Аналогично предыдущему пункту, чтобы увидеть адреса страниц с проблемными заголовками, необходимо выбрать тип проблемы в фильтре.
  • Meta Description – аналогично Page Titles, только для описания страниц (метатег Description).
  • Meta Keywords – отображает результаты по содержанию тега Keywords для каждой страницы. Здесь можно увидеть адреса страниц с дублями ключевых слов или страницы, где метатег Keywords не заполнен.
  • Вкладки H1 и H2 соответственно отображают результаты по всем заголовкам H1 и H2, найденные на каждой странице сайта.
  • Images – эта вкладка отображает список изображений, их размер и количество ссылающихся на них ссылок.
  • Directives – здесь можно увидеть типы URL адресов: follow/nofollow, refresh, canonical и другие.

На каждой вкладке, как мы уже говорили ранее, есть фильтр, кнопка экспорта, кнопка вида таблицы и поисковая строка.

Подробнее:

  • Фильтр – таблицы на каждой вкладке можно фильтровать по параметрам, которые в свою очередь зависят от типа вкладки.
  • Кнопка экспорта – таблицы (отчеты) можно экспортировать. Экспортируются они с учетом фильтра и сортировки. (Например, если вы указали в фильтре файлы только CSS, и у вас отображается только CSS файл, то и в экспорт попадут исключительно CSS файлы.)
  • Вид – здесь есть 2 варианта: древовидный вид и список (второй вид отображен на скриншоте).
  • Строка поиска – поиск общий, указанные значения ищутся во всех параметрах отчета активной вкладки.

В нижней части программы Screaming frog SEO spider есть окно с вкладками. В этом окне выводится информация по каждому выбранному URL адресу из основного окна. Подробнее о каждой вкладке:

  • URL Info – основная информация о ссылке
  • Inlinks – входящие ссылки
  • Outlinks – исходящие ссылки
  • Image Info – информация об изображениях, связанных с выбранным URL адресом.
  • SERP Snippet – информация о сниппете выбранного URL адреса.

И последнее, что мы разберем в этой статье, – правый блок программы, у которого, аналогично остальным блокам, есть вкладки, которые расположены вверху.

Подробнее о каждой из вкладок:

Результаты тестирования 22 веб-краулеров

После многочисленных тестов нами были получены такие результаты:

Программа Время на скан 100 страниц Время на скан 1 000 страниц Время на скан 10 000 страниц Время на скан 100 000 страниц Широкий набор аудируемых параметров Гибкая фильтрация данных Сканирование произвольных URL Расчет Page Rank Визуализация данных на графе Freeware
Screaming Frog SEO Spider 0:00:08 0:00:45 0:05:35 1:03:30 + + +
Netpeak Spider 0:00:04 0:00:30 0:04:53 0:55:11 + + + + +
SiteAnalyzer 0:00:06 0:00:22 0:06:47 2:04:36 + + + + + +
Forecheck 0:00:15 0:01:12 0:08:02 1:36:14 + +
Sitebulb 0:00:08 0:01:26 0:16:32 2:47:54 + + +
WebSite Auditor 0:00:07 0:00:40 0:05:56 2:36:26 + +
Comparser 0:00:12 +
Visual SEO Studio 0:00:15 0:02:24 0:24:14 4:08:47
Xenu 0:00:12 0:01:22 0:14:41 2:23:32 +
Darcy SEO Checker 0:00:04 0:00:31 0:05:40 0:58:45
LinkChecker 0:00:29 0:00:52 0:03:22 0:52:04 +
PageWeight Desktop 0:00:06 0:00:56 0:17:40 4:23:15 +
Beam Us Up 0:00:08 0:01:03 0:10:18 1:43:03 +
Webbee 0:00:10 0:01:58
WildShark SEO spider 0:00:28 0:07:20 +
Site Visualizer 0:00:11 0:01:58 0:38:15
RiveSolutions SEO Spider 0:00:06 0:00:49 0:08:14 1:55:19
IIS SEO Toolkit 0:00:03 0:00:46 0:07:08 1:02:26 +
Website Link Analyzer 0:00:09 0:02:38 0:24:56 4:33:41 +
A1 Website Analyzer 0:00:24 0:05:32 0:53:15 8:42:11 +
seoBOXX WebsiteAnalyser 0:00:12 0:01:15 0:17:31 3:51:08
Smart SEO Auditor 0:04:46

Примечание: на сканировании 100 и 1 000 страниц нет смысла сильно заострять внимание в виду разницы алгоритмов обхода краулерами страниц у разных программ. А вот скорость сканирования 10 000 и 100 000 страниц уже показательна, так как отражает более-менее стабильную скорость работы краулеров на дальней дистанции

2) SERP Mode For Uploading Page Titles & Descriptions

You can now switch to ‘SERP mode’ and upload page titles and meta descriptions directly into the SEO Spider to calculate pixel widths. There is no crawling involved in this mode, so they do not need to be live on a website.

This means you can export page titles and descriptions from the SEO Spider, make bulk edits in Excel (if that’s your preference, rather than in the tool itself) and then upload them back into the tool to understand how they may appear in Google’s SERPs.

Under ‘reports’, we have a new ‘SERP Summary’ report which is in the format required to re-upload page titles and descriptions. We simply require three headers for ‘URL’, ‘Title’ and ‘Description’.

The tool will then upload these into the SEO Spider and run the calculations without any crawling.

1) Tree View

You can now switch from the usual ‘list view’ of a crawl, to a more traditional directory ‘tree view’ format, while still mantaining the granular detail of each URL crawled you see in the standard list view.

This additional view will hopefully help provide an alternative perspective when analysing a website’s architecture.

The SEO Spider doesn’t crawl this way natively, so switching to ‘tree view’ from ‘list view’ will take a little time to build, & you may see a progress bar on larger crawls for instance. This has been requested as a feature for quite sometime, so thanks to all for their feedback.

Other Updates

Version 15.0 also includes a number of smaller updates and bug fixes, outlined below.

  • Math Solvers and Practice Problems Google rich result features are now supported in structured data validation.
  • There’s a new ‘Crawl Timestamp’ column in the Internal tab, which should help with automation, reporting and debugging.
  • Project folders within the ‘File > Crawls’ menu are now collapsed by default.
  • The URL bar will now default to HTTPS if you don’t specify the protocol.
  • Fixed a blurry interface issue on high resolution screens on both Windows and Linux (for any scaling setting).
  • Fixed many bugs that are too monotonous to include in any detail. You’re not even reading this last bullet point, so why am I writing it?

That’s all for now. We think these features help raise the SEO Spider to a new level, so hopefully, you find them useful. Please see our tutorial on ‘How To Compare Crawls‘ for more on how to use all the features released above. If you experience any issues, please let us know via support and we’ll help.

Thank you to everyone for all their feature requests, feedback, and continued support.

Now, go and download version 15.0 of the Screaming Frog SEO Spider and let us know what you think!

1) Crawl Comparison

You can now compare crawls and see how data, issues and opportunities have changed in tabs and filters over time.

This feature helps track the progress of technical SEO issues and opportunities and provides granular data about what’s changed between the crawls.

To compare, go to ‘File > Crawls’, highlight two crawls, and ‘Select To Compare’.

Or, switch to ‘Mode > Compare’ and click ‘Select Crawl’ via the top menu to pick two crawls you wish to compare. You can adjust the compare configuration (more on that shortly) or just click ‘Compare’.

The crawl comparison analysis will then run and the right-hand overview tab will populate to show current and previous crawl data and changes.

It will identify whether existing URLs found in the previous crawl have moved from or to a tab or filter (‘added’ and ‘removed’), or if a URL is entirely ‘new’, or now ‘missing’ in the latest crawl. This helps better understand progress and if issues are going up or down for URLs you already know about.

You’re able to click on the numbers in the columns to view which URLs have changed, and use the filter on the master window view to toggle between current and previous crawls, or added, new, removed or missing as well.

(Insert joke about the results being more impressive than Arsenal’s)

The compare feature is only available in database storage mode with a licence. If you haven’t already moved, it’s as simple as ‘Config > System > Storage Mode’ and choosing ‘Database Storage’.

Database storage comes with a number of significant benefits, such as improved crawling at scale, auto storing of crawls, super-quick opening and helping to avoid lost crawls if your machine turns off unexpectedly during that 1m URL crawl. Check out our storage modes video guide for an overview.

Small Update – Version 11.1 Released 13th March 2019

We have just released a small update to version 11.1 of the SEO Spider. This release is mainly bug fixes and small improvements –

  • Add 1:1 hreflang URL report, available under ‘Reports > Hreflang > All hreflang URLs’.
  • Cleaned up the preset user-agent list.
  • Fix issue reading XML sitemaps with leading blank lines.
  • Fix issue with parsing and validating structured data.
  • Fix issue with list mode crawling more than the list.
  • Fix issue with list mode crawling of XML sitemaps.
  • Fix issue with scheduling UI unable to delete/edit tasks created by 10.x.
  • Fix issue with visualisations, where the directory tree diagrams were showing the incorrect URL on hover.
  • Fix issue with GA/GSC case insensitivty and trailing slash options.
  • Fix crash when JavaScript crawling with cookies enabled.
Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *

Adblock
detector