Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit e8706fb7 authored by Markus Heiser's avatar Markus Heiser Committed by Markus Heiser
Browse files

[fix] engine & network issues / documentation and type annotations

This patch fixes some quirks and issues related to the engines and the network.
Each engine has its own network and this network was broken for the following
engines[1]:

- archlinux
- bing
- dailymotion
- duckduckgo
- google
- peertube
- startpage
- wikipedia

Since the files have been touched anyway, the type annotaions of the engine
modules has also been completed so that error messages from the type checker are
no longer reported.

Related and (partial) fixed issue:

- [1] https://github.com/searxng/searxng/issues/762#issuecomment-1605323861
- [2] https://github.com/searxng/searxng/issues/2513
- [3] https://github.com/searxng/searxng/issues/2515



Signed-off-by: default avatarMarkus Heiser <markus.heiser@darmarit.de>
parent 2e4a4351
Loading
Loading
Loading
Loading
+60 −14
Original line number Diff line number Diff line
@@ -397,14 +397,26 @@ Communication with search engines.
  Global timeout of the requests made to others engines in seconds.  A bigger
  timeout will allow to wait for answers from slow engines, but in consequence
  will slow SearXNG reactivity (the result page may take the time specified in the
  timeout to load). Can be override by :ref:`settings engine`
  timeout to load).  Can be override by ``timeout`` in the :ref:`settings engine`.

``useragent_suffix`` :
  Suffix to the user-agent SearXNG uses to send requests to others engines.  If an
  engine wish to block you, a contact info here may be useful to avoid that.

.. _Pool limit configuration: https://www.python-httpx.org/advanced/#pool-limit-configuration

``pool_maxsize``:
  Number of allowable keep-alive connections, or ``null`` to always allow.  The
  default is 10.  See ``max_keepalive_connections`` `Pool limit configuration`_.

``pool_connections`` :
  Maximum number of allowable connections, or ``null`` # for no limits.  The
  default is 100.  See ``max_connections`` `Pool limit configuration`_.

``keepalive_expiry`` :
  Number of seconds to keep a connection in the pool.  By default 5.0 seconds.
  See ``keepalive_expiry`` `Pool limit configuration`_.


.. _httpx proxies: https://www.python-httpx.org/advanced/#http-proxying

@@ -429,15 +441,6 @@ Communication with search engines.
  Number of retry in case of an HTTP error.  On each retry, SearXNG uses an
  different proxy and source ip.

``retry_on_http_error`` :
  Retry request on some HTTP status code.

  Example:

  * ``true`` : on HTTP status code between 400 and 599.
  * ``403`` : on HTTP status code 403.
  * ``[403, 429]``: on HTTP status code 403 and 429.

``enable_http2`` :
  Enable by default. Set to ``false`` to disable HTTP/2.

@@ -455,6 +458,11 @@ Communication with search engines.
``max_redirects`` :
  30 by default. Maximum redirect before it is an error.

``using_tor_proxy`` :
  Using tor proxy (``true``) or not (``false``) for all engines.  The default is
  ``false`` and can be overwritten in the :ref:`settings engine`



.. _settings categories_as_tabs:

@@ -522,13 +530,14 @@ engine is shown. Most of the options have a default value or even are optional.
        use_official_api: true
        require_api_key: true
        results: HTML
     enable_http: false

     # overwrite values from section 'outgoing:'
     enable_http2: false
     retries: 1
     retry_on_http_error: true # or 403 or [404, 429]
     max_connections: 100
     max_keepalive_connections: 10
     keepalive_expiry: 5.0
     using_tor_proxy: false
     proxies:
       http:
         - http://proxy1:8080
@@ -539,6 +548,11 @@ engine is shown. Most of the options have a default value or even are optional.
         - socks5://user:password@proxy3:1080
         - socks5h://user:password@proxy4:1080

     # other network settings
     enable_http: false
     retry_on_http_error: true # or 403 or [404, 429]


``name`` :
  Name that will be used across SearXNG to define this engine.  In settings, on
  the result page...
@@ -579,7 +593,8 @@ engine is shown. Most of the options have a default value or even are optional.
  query all search engines in that category (group).

``timeout`` : optional
  Timeout of the search with the current search engine.  **Be careful, it will
  Timeout of the search with the current search engine.  Overwrites
  ``request_timeout`` from :ref:`settings outgoing`.  **Be careful, it will
  modify the global timeout of SearXNG.**

``api_key`` : optional
@@ -615,6 +630,37 @@ engine is shown. Most of the options have a default value or even are optional.
  - ``ipv4`` set ``local_addresses`` to ``0.0.0.0`` (use only IPv4 local addresses)
  - ``ipv6`` set ``local_addresses`` to ``::`` (use only IPv6 local addresses)

``enable_http`` : optional
  Enable HTTP for this engine (by default only HTTPS is enabled).

``retry_on_http_error`` : optional
  Retry request on some HTTP status code.

  Example:

  * ``true`` : on HTTP status code between 400 and 599.
  * ``403`` : on HTTP status code 403.
  * ``[403, 429]``: on HTTP status code 403 and 429.

``proxies`` :
  Overwrites proxy settings from :ref:`settings outgoing`.

``using_tor_proxy`` :
  Using tor proxy (``true``) or not (``false``) for this engine.  The default is
  taken from ``using_tor_proxy`` of the :ref:`settings outgoing`.

``max_keepalive_connection#s`` :
  `Pool limit configuration`_, overwrites value ``pool_maxsize`` from
   :ref:`settings outgoing` for this engine.

``max_connections`` :
  `Pool limit configuration`_, overwrites value ``pool_connections`` from
  :ref:`settings outgoing` for this engine.

``keepalive_expiry`` :
  `Pool limit configuration`_, overwrites value ``keepalive_expiry`` from
  :ref:`settings outgoing` for this engine.

.. note::

   A few more options are possible, but they are pretty specific to some
+13 −1
Original line number Diff line number Diff line
@@ -17,7 +17,7 @@


from __future__ import annotations
from typing import Union, Dict, List, Callable, TYPE_CHECKING
from typing import List, Callable, TYPE_CHECKING

if TYPE_CHECKING:
    from searx.enginelib import traits
@@ -134,3 +134,15 @@ class Engine: # pylint: disable=too-few-public-methods
          require_api_key: true
          results: HTML
    """

    using_tor_proxy: bool
    """Using tor proxy (``true``) or not (``false``) for this engine."""

    send_accept_language_header: bool
    """When this option is activated, the language (locale) that is selected by
    the user is used to build and send a ``Accept-Language`` header in the
    request to the origin search engine."""

    tokens: List[str]
    """A list of secret tokens to make this engine *private*, more details see
    :ref:`private engines`."""
+3 −3
Original line number Diff line number Diff line
@@ -13,6 +13,7 @@ used.
from __future__ import annotations
import json
import dataclasses
import types
from typing import Dict, Iterable, Union, Callable, Optional, TYPE_CHECKING
from typing_extensions import Literal, Self

@@ -82,8 +83,7 @@ class EngineTraits:
    """

    custom: Dict[str, Union[Dict[str, Dict], Iterable[str]]] = dataclasses.field(default_factory=dict)
    """A place to store engine's custom traits, not related to the SearXNG core

    """A place to store engine's custom traits, not related to the SearXNG core.
    """

    def get_language(self, searxng_locale: str, default=None):
@@ -228,7 +228,7 @@ class EngineTraitsMap(Dict[str, EngineTraits]):

        return obj

    def set_traits(self, engine: Engine):
    def set_traits(self, engine: Engine | types.ModuleType):
        """Set traits in a :py:obj:`Engine` namespace.

        :param engine: engine instance build by :py:func:`searx.engines.load_engine`
+43 −26
Original line number Diff line number Diff line
@@ -17,7 +17,9 @@ import sys
import copy
from os.path import realpath, dirname

from typing import TYPE_CHECKING, Dict, Optional
from typing import TYPE_CHECKING, Dict
import types
import inspect

from searx import logger, settings
from searx.utils import load_module
@@ -28,21 +30,23 @@ if TYPE_CHECKING:
logger = logger.getChild('engines')
ENGINE_DIR = dirname(realpath(__file__))
ENGINE_DEFAULT_ARGS = {
    # Common options in the engine module
    "engine_type": "online",
    "inactive": False,
    "disabled": False,
    "timeout": settings["outgoing"]["request_timeout"],
    "shortcut": "-",
    "categories": ["general"],
    "paging": False,
    "safesearch": False,
    "time_range_support": False,
    "safesearch": False,
    # settings.yml
    "categories": ["general"],
    "enable_http": False,
    "using_tor_proxy": False,
    "shortcut": "-",
    "timeout": settings["outgoing"]["request_timeout"],
    "display_error_messages": True,
    "disabled": False,
    "inactive": False,
    "about": {},
    "using_tor_proxy": False,
    "send_accept_language_header": False,
    "tokens": [],
    "about": {},
}
# set automatically when an engine does not have any tab category
DEFAULT_CATEGORY = 'other'
@@ -51,7 +55,7 @@ DEFAULT_CATEGORY = 'other'
# Defaults for the namespace of an engine module, see :py:func:`load_engine`

categories = {'general': []}
engines: Dict[str, Engine] = {}
engines: Dict[str, Engine | types.ModuleType] = {}
engine_shortcuts = {}
"""Simple map of registered *shortcuts* to name of the engine (or ``None``).

@@ -63,7 +67,19 @@ engine_shortcuts = {}
"""


def load_engine(engine_data: dict) -> Optional[Engine]:
def check_engine_module(module: types.ModuleType):
    # probe unintentional name collisions / for example name collisions caused
    # by import statements in the engine module ..

    # network: https://github.com/searxng/searxng/issues/762#issuecomment-1605323861
    obj = getattr(module, 'network', None)
    if obj and inspect.ismodule(obj):
        msg = f'type of {module.__name__}.network is a module ({obj.__name__}), expected a string'
        # logger.error(msg)
        raise TypeError(msg)


def load_engine(engine_data: dict) -> Engine | types.ModuleType | None:
    """Load engine from ``engine_data``.

    :param dict engine_data:  Attributes from YAML ``settings:engines/<engine>``
@@ -100,19 +116,20 @@ def load_engine(engine_data: dict) -> Optional[Engine]:
        engine_data['name'] = engine_name

    # load_module
    engine_module = engine_data.get('engine')
    if engine_module is None:
    module_name = engine_data.get('engine')
    if module_name is None:
        logger.error('The "engine" field is missing for the engine named "{}"'.format(engine_name))
        return None
    try:
        engine = load_module(engine_module + '.py', ENGINE_DIR)
        engine = load_module(module_name + '.py', ENGINE_DIR)
    except (SyntaxError, KeyboardInterrupt, SystemExit, SystemError, ImportError, RuntimeError):
        logger.exception('Fatal exception in engine "{}"'.format(engine_module))
        logger.exception('Fatal exception in engine "{}"'.format(module_name))
        sys.exit(1)
    except BaseException:
        logger.exception('Cannot load engine "{}"'.format(engine_module))
        logger.exception('Cannot load engine "{}"'.format(module_name))
        return None

    check_engine_module(engine)
    update_engine_attributes(engine, engine_data)
    update_attributes_for_tor(engine)

@@ -153,18 +170,18 @@ def set_loggers(engine, engine_name):
            and not hasattr(module, "logger")
        ):
            module_engine_name = module_name.split(".")[-1]
            module.logger = logger.getChild(module_engine_name)
            module.logger = logger.getChild(module_engine_name)  # type: ignore


def update_engine_attributes(engine: Engine, engine_data):
def update_engine_attributes(engine: Engine | types.ModuleType, engine_data):
    # set engine attributes from engine_data
    for param_name, param_value in engine_data.items():
        if param_name == 'categories':
            if isinstance(param_value, str):
                param_value = list(map(str.strip, param_value.split(',')))
            engine.categories = param_value
            engine.categories = param_value  # type: ignore
        elif hasattr(engine, 'about') and param_name == 'about':
            engine.about = {**engine.about, **engine_data['about']}
            engine.about = {**engine.about, **engine_data['about']}  # type: ignore
        else:
            setattr(engine, param_name, param_value)

@@ -174,10 +191,10 @@ def update_engine_attributes(engine: Engine, engine_data):
            setattr(engine, arg_name, copy.deepcopy(arg_value))


def update_attributes_for_tor(engine: Engine) -> bool:
def update_attributes_for_tor(engine: Engine | types.ModuleType):
    if using_tor_proxy(engine) and hasattr(engine, 'onion_url'):
        engine.search_url = engine.onion_url + getattr(engine, 'search_path', '')
        engine.timeout += settings['outgoing'].get('extra_proxy_timeout', 0)
        engine.search_url = engine.onion_url + getattr(engine, 'search_path', '')  # type: ignore
        engine.timeout += settings['outgoing'].get('extra_proxy_timeout', 0)  # type: ignore


def is_missing_required_attributes(engine):
@@ -193,12 +210,12 @@ def is_missing_required_attributes(engine):
    return missing


def using_tor_proxy(engine: Engine):
def using_tor_proxy(engine: Engine | types.ModuleType):
    """Return True if the engine configuration declares to use Tor."""
    return settings['outgoing'].get('using_tor_proxy') or getattr(engine, 'using_tor_proxy', False)


def is_engine_active(engine: Engine):
def is_engine_active(engine: Engine | types.ModuleType):
    # check if engine is inactive
    if engine.inactive is True:
        return False
@@ -210,7 +227,7 @@ def is_engine_active(engine: Engine):
    return True


def register_engine(engine: Engine):
def register_engine(engine: Engine | types.ModuleType):
    if engine.name in engines:
        logger.error('Engine config error: ambiguous name: {0}'.format(engine.name))
        sys.exit(1)
+13 −12
Original line number Diff line number Diff line
@@ -14,7 +14,6 @@ from urllib.parse import urlencode, urljoin, urlparse
import lxml
import babel

from searx import network
from searx.utils import extract_text, eval_xpath_list, eval_xpath_getindex
from searx.enginelib.traits import EngineTraits
from searx.locales import language_tag
@@ -45,13 +44,13 @@ main_wiki = 'wiki.archlinux.org'
def request(query, params):

    sxng_lang = params['searxng_locale'].split('-')[0]
    netloc = traits.custom['wiki_netloc'].get(sxng_lang, main_wiki)
    title = traits.custom['title'].get(sxng_lang, 'Special:Search')
    netloc: str = traits.custom['wiki_netloc'].get(sxng_lang, main_wiki)  # type: ignore
    title: str = traits.custom['title'].get(sxng_lang, 'Special:Search')  # type: ignore
    base_url = 'https://' + netloc + '/index.php?'
    offset = (params['pageno'] - 1) * 20

    if netloc == main_wiki:
        eng_lang: str = traits.get_language(sxng_lang, 'English')
        eng_lang: str = traits.get_language(sxng_lang, 'English')  # type: ignore
        query += ' (' + eng_lang + ')'
    elif netloc == 'wiki.archlinuxcn.org':
        base_url = 'https://' + netloc + '/wzh/index.php?'
@@ -71,11 +70,11 @@ def request(query, params):
def response(resp):

    results = []
    dom = lxml.html.fromstring(resp.text)
    dom = lxml.html.fromstring(resp.text)  # type: ignore

    # get the base URL for the language in which request was made
    sxng_lang = resp.search_params['searxng_locale'].split('-')[0]
    netloc = traits.custom['wiki_netloc'].get(sxng_lang, main_wiki)
    netloc: str = traits.custom['wiki_netloc'].get(sxng_lang, main_wiki)  # type: ignore
    base_url = 'https://' + netloc + '/index.php?'

    for result in eval_xpath_list(dom, '//ul[@class="mw-search-results"]/li'):
@@ -83,7 +82,7 @@ def response(resp):
        content = extract_text(result.xpath('.//div[@class="searchresult"]'))
        results.append(
            {
                'url': urljoin(base_url, link.get('href')),
                'url': urljoin(base_url, link.get('href')),  # type: ignore
                'title': extract_text(link),
                'content': content,
            }
@@ -114,6 +113,8 @@ def fetch_traits(engine_traits: EngineTraits):
       },

    """
    # pylint: disable=import-outside-toplevel
    from searx.network import get  # see https://github.com/searxng/searxng/issues/762

    engine_traits.custom['wiki_netloc'] = {}
    engine_traits.custom['title'] = {}
@@ -125,11 +126,11 @@ def fetch_traits(engine_traits: EngineTraits):
        'zh': 'Special:搜索',
    }

    resp = network.get('https://wiki.archlinux.org/')
    if not resp.ok:
    resp = get('https://wiki.archlinux.org/')
    if not resp.ok:  # type: ignore
        print("ERROR: response from wiki.archlinix.org is not OK.")

    dom = lxml.html.fromstring(resp.text)
    dom = lxml.html.fromstring(resp.text)  # type: ignore
    for a in eval_xpath_list(dom, "//a[@class='interlanguage-link-target']"):

        sxng_tag = language_tag(babel.Locale.parse(a.get('lang'), sep='-'))
@@ -143,9 +144,9 @@ def fetch_traits(engine_traits: EngineTraits):
                print("ERROR: title tag from %s (%s) is unknown" % (netloc, sxng_tag))
                continue
            engine_traits.custom['wiki_netloc'][sxng_tag] = netloc
            engine_traits.custom['title'][sxng_tag] = title
            engine_traits.custom['title'][sxng_tag] = title  # type: ignore

        eng_tag = extract_text(eval_xpath_list(a, ".//span"))
        engine_traits.languages[sxng_tag] = eng_tag
        engine_traits.languages[sxng_tag] = eng_tag  # type: ignore

    engine_traits.languages['en'] = 'English'
Loading