.. docscraper documentation master file, created by
   sphinx-quickstart on Sat Mar  6 12:37:21 2021.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

DocScraper
======================================

The ``docscraper`` package is a ``scrapy`` spider for crawling a give set of
websites and dowloading all available documents with a given set of file
extensions.

Getting Started
---------------

You can get started by downloading the package with ``pip``::

$ pip install docscraper

Once the package is installed, you can use it with scrapy directly in your
Python script to download files from websites as follows:

.. doctest::

   >>> import docscraper
   >>> allowed_domains = ["books.toscrape.com"]
   >>> start_urls = ["https://books.toscrape.com"]
   >>> extensions = [".html", ".pdf", ".docx", ".doc", ".svg"]
   >>> docscraper.crawl(allowed_domains, start_urls, extensions=extensions)


.. toctree::
   :maxdepth: 2
   :caption: Contents:


Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`