Github repository README
Introduction
This is a Command Line Interface tool to fetch web pages, save them locally, and analyse them using class name attributes to create ‘component usage’ statistics.
Theory
Before running its recommended to understand how the tool works.
Defining A Component
The input is a partial class name. The output is the highest counted single class name that contains that partial class name.
Example: If we want statistics on a Hero component we look for any pattern (start, end or containing) using the specified .hero
class defined in catalog.json
. So, if we found .hero__inner
and .hero__wrapper--123
we would use the class name found the most to create the Hero statistics.
Generating Statistics
The tool is broken into a number of steps to create statistics on a set of components. 0. Before running the input
folder needs to contain two important files: - A catalog.json
contains the list of pages created manually to audit. Each has a unique id
and a market
category. - A component_map.json
contains a component list to find statistics on. Each has a title
and the class pattern selector
.
- Firstly
setup.js
will create the output
folder structure with empty directories ready for files to be written to.
- Then
get-pages.js
will download all pages supplied by the catalog.json
file into the page_html
folder. It also takes a screen shot of each page and add to the page_screenshots
folder.
- Once pages are stored locally the
search.js
script can be used to find all instances of a class to help identify components and where they are used and not used. This is helpful to test, refine and review class names and screen grabs.
- Then
scan-pages.js
will crawl and create report per page in page_reports
listing every class
found along with data to trace the market
, url
, id
etc.
- Then
report-pages.js
will pull together statistics on every class across all pages and identify overall and per market use into pages_report.json
.
- Lastly
report-components.js
will pull together statistics on components supplied in component_map.json
(please see Defining A Component section above for important details on how this works).
Setup
Create empty directories ready to populate.
Generates:
- Empty directory as
./output
- Empty directory as
./output/search_results
- Empty directory as
./output/page_screenshots
- Empty directory as
./output/page_html
- Empty directory as
./output/page_reports
npm run setup
Get Pages
Take html pages offline to make processing faster, safer and more flexible.
Crawl a list of urls and download each one, also taking a full page screenshot (which also includes confirming cookie alert to hide before screenshot).
Generates:
- Contents of
<body>
for each page as ./output/page_html/{id}.html
- Screen shot of each page as
./output/page_screenshots/{id}.png
npm run get-pages
Scan Pages
Reduce pages down to an array of class names with additional page details.
Create a raw data set about each page in turn. Data generated includes:
- Page URL.
- Page ID (for cross reference any data at any later stage).
- Market.
- Language.
- List of every class on the page.
- Total count of classes.
Generates:
- A report per page as
./output/page_reports/{id}.json
npm run scan-pages
Report Pages
Analyse all pages to get overall statistics.
Collate all raw data about each page into one. Data revealed on each (and every) class used across all pages includes:
- Class name.
- Total count across all pages.
- Total number of pages with class found.
- Per market:
- Total count.
- Total number of pages used.
Generates:
- A report as
./output/page_reports.json
npm run report-pages
Report Components
Analyse all pages to get component statistics.
Collate all raw data about each component into one. Data revealed on each component used across all pages includes:
- Component name.
- Class name.
- Total number of pages.
- Total count across all pages.
- Usage percentage across all pages.
- Per market:
- Total count.
- Total number of pages.
- Usage percentage.
Generates:
- A report as
./output/component_reports.json
npm run report-components
Search
Find all instances of a class to investigate it’s use across all pages.
Using a class to deep dive and generate a report on pages its being used (and optionally not being used) as well as bundle screenshots together for review by eye. Data generated through search includes:
- Report of pages including/excluding class with each:
- Page url.
- Page id.
- Page market.
- Screen shot of each page including/excluding class.
Generates:
- Directory of search results as
./output/search_results/{name-of-folder}/
with:
- Directory
includes/
with a report.json
and a screen shot per page as{id}.png
of pages where class is found.
- Directory
excludes/
with a report.json
and a screen shot per page as{id}.png
of pages where class is not found.
npm run search class="{name-of-class}" name="{name-of-folder}" excludes="true"