GitLab will be updated to 13.0 during this weekend. It may be unavailable between 21:00 and midnight UTC every night Friday-Sunday.

Commit 3627eb4a authored by Nicolas Gelot's avatar Nicolas Gelot

Merge remote-tracking branch 'asciimoo/master'

parents 8c261be7 9c2679c3
......@@ -28,8 +28,6 @@ stages:
jobs:
include:
- python: "2.7"
env: PY=2
- python: "3.5"
- python: "3.6"
- python: "3.7"
......
Searx was created by Adam Tauber and is maintained by Adam Tauber, Alexandre Flament, Noémi Ványi, @pofilo and Markus Heiser.
Searx was created by Adam Tauber and is maintained by Adam Tauber, Alexandre Flament, Noémi Ványi, @pofilo, Gaspard d'Hautefeuille and Markus Heiser.
Major contributing authors:
......@@ -124,3 +124,12 @@ generally made searx better:
- @CaffeinatedTech
- Robin Schneider @ypid
- @splintah
- Lukas van den Berk @lukasvdberk
- @piplongrun
- Jason Kaltsikis @jjasonkal
- Sion Kazama @KazamaSion
- @resynth1943
- Mostafa Ahangarha @ahangarha
- @gordon-quad
- Sophie Tauchert @999eagle
- @bauruine
0.17.0 2020.07.09
=================
- New engines
- eTools
- Wikibooks
- Wikinews
- Wikiquote
- Wikisource
- Wiktionary
- Wikiversity
- Wikivoyage
- Rubygems
- Engine fixes (google, google images, startpage, gigablast, yacy)
- Private engines introduced - more details: https://asciimoo.github.io/searx/blog/private-engines.html
- Greatly improved documentation - check it at https://asciimoo.github.io/searx
- Added autofocus to all search inputs
- CSP friendly oscar theme
- Added option to hide engine errors with `display_error_messages` engine option (true/false values, default is true)
- Tons of accessibility fixes - see https://github.com/asciimoo/searx/issues/350 for details
- More flexible branding options: configurable vcs/issue tracker links
- Added "disable all" & "allow all" options to preferences engine select
- Autocomplete keyboard navigation fixes
- Configurable category order
- Wrap long lines in infoboxes
- Added RSS subscribtion link
- Added routing directions to OSM results
- Added author and length attributes to youtube videos
- Fixed image stretch with mobile viewport in oscar theme
- Added translatable JS strings
- Better HTML annotations - engine names and endpoints are available as classes
- RTL text fixes in oscar theme
- Handle weights in accept-language HTTP headers
- Added answerer results to rss/csv output
- Added new autocomplete backends to settings.yml
- Updated opensearch.xml
- Fixed custom locale setting from settings.yml
- Translation updates
- Removed engines: faroo
Special thanks to `NLNet <https://nlnet.nl>`__ for sponsoring multiple features of this release.
Special thanks to https://www.accessibility.nl/english for making accessibilty audit.
News
~~~~
- @HLFH joined the maintainer team
- Dropped Python2 support
0.16.0 2020.01.30
=================
......
......@@ -166,29 +166,29 @@ quiet_cmd_grunt = GRUNT $2
cmd_grunt = PATH="$$(npm bin):$$PATH" \
grunt --gruntfile "$2"
themes.oscar:
themes.oscar: node.env
$(Q)echo '[!] build oscar theme'
$(call cmd,grunt,searx/static/themes/oscar/gruntfile.js)
themes.simple:
themes.simple: node.env
$(Q)echo '[!] build simple theme'
$(call cmd,grunt,searx/static/themes/simple/gruntfile.js)
themes.legacy:
themes.legacy: node.env
$(Q)echo '[!] build legacy theme'
$(call cmd,lessc,themes/legacy/less/style-rtl.less,themes/legacy/css/style-rtl.css)
$(call cmd,lessc,themes/legacy/less/style.less,themes/legacy/css/style.css)
themes.courgette:
themes.courgette: node.env
$(Q)echo '[!] build courgette theme'
$(call cmd,lessc,themes/courgette/less/style.less,themes/courgette/css/style.css)
$(call cmd,lessc,themes/courgette/less/style-rtl.less,themes/courgette/css/style-rtl.css)
themes.pixart:
themes.pixart: node.env
$(Q)echo '[!] build pixart theme'
$(call cmd,lessc,themes/pix-art/less/style.less,themes/pix-art/css/style.css)
themes.bootstrap:
themes.bootstrap: node.env
$(call cmd,lessc,less/bootstrap/bootstrap.less,css/bootstrap.min.css)
themes.eelo:
......@@ -228,6 +228,7 @@ test.pylint: pyenvinstall
$(call cmd,pylint,\
searx/preferences.py \
searx/testing.py \
searx/engines/gigablast.py \
)
endif
......@@ -248,7 +249,7 @@ test.sh:
test.pep8: pyenvinstall
@echo "TEST pep8"
$(Q)$(PY_ENV_ACT); pep8 --exclude=searx/static --max-line-length=120 --ignore "E402,W503" searx tests
$(Q)$(PY_ENV_ACT); pep8 --exclude='searx/static, searx/engines/gigablast.py' --max-line-length=120 --ignore "E402,W503" searx tests
test.unit: pyenvinstall
@echo "TEST tests/unit"
......
......@@ -11,7 +11,6 @@ Spot was forked from searx: read [documentation](https://asciimoo.github.io/sear
* eelo theme
* redis cache on http requests (TTL 1 day)
* docker packaging thinking to be production ready
* better locale support
## Architecture
......
......@@ -108,6 +108,7 @@ restart the uwsgi application.
:start-after: START searx uwsgi-description ubuntu-20.04
:end-before: END searx uwsgi-description ubuntu-20.04
.. hotfix: a bug group-tab need this comment
.. group-tab:: Arch Linux
......@@ -115,6 +116,7 @@ restart the uwsgi application.
:start-after: START searx uwsgi-description arch
:end-before: END searx uwsgi-description arch
.. hotfix: a bug group-tab need this comment
.. group-tab:: Fedora / RHEL
......@@ -128,22 +130,21 @@ restart the uwsgi application.
.. group-tab:: Ubuntu / debian
.. kernel-include:: $DOCS_BUILD/includes/searx.rst
:code: ini
:start-after: START searx uwsgi-appini ubuntu-20.04
:end-before: END searx uwsgi-appini ubuntu-20.04
.. hotfix: a bug group-tab need this comment
.. group-tab:: Arch Linux
.. kernel-include:: $DOCS_BUILD/includes/searx.rst
:code: ini
:start-after: START searx uwsgi-appini arch
:end-before: END searx uwsgi-appini arch
.. hotfix: a bug group-tab need this comment
.. group-tab:: Fedora / RHEL
.. kernel-include:: $DOCS_BUILD/includes/searx.rst
:code: ini
:start-after: START searx uwsgi-appini fedora
:end-before: END searx uwsgi-appini fedora
......@@ -6,6 +6,7 @@ Blog
:maxdepth: 2
:caption: Contents
lxcdev-202006
python3
admin
intro-offline
......
This diff is collapsed.
......@@ -13,7 +13,7 @@ Private engines
To solve this issue private engines were introduced in :pull:`1823`.
A new option was added to engines named `tokens`. It expects a list
of strings. If the user making a request presents one of the tokens
of an engine, he/she is able to access information about the engine
of an engine, they can access information about the engine
and make search requests.
Example configuration to restrict access to the Arch Linux Wiki engine:
......
......@@ -30,6 +30,14 @@ Example plugin
ctx['search'].suggestions.add('example')
return True
Register your plugin
====================
To enable your plugin register your plugin in
searx > plugin > __init__.py.
And at the bottom of the file add your plugin like.
``plugins.register(name_of_python_file)``
Plugin entry points
===================
......
......@@ -81,7 +81,7 @@ Parameters
Theme of instance.
Please note, available themes depend on an instance. It is possible that an
instance administrator deleted, created or renamed themes on his/her instance.
instance administrator deleted, created or renamed themes on their instance.
See the available options in the preferences page of the instance.
``oscar-style`` : default ``logicodev``
......@@ -91,7 +91,7 @@ Parameters
``oscar``.
Please note, available styles depend on an instance. It is possible that an
instance administrator deleted, created or renamed styles on his/her
instance administrator deleted, created or renamed styles on their
instance. See the available options in the preferences page of the instance.
``enabled_plugins`` : optional
......
......@@ -44,9 +44,9 @@ hidden from visited result pages.
What are the consequences of using public instances?
----------------------------------------------------
If someone uses a public instance, he/she has to trust the administrator of that
If someone uses a public instance, they have to trust the administrator of that
instance. This means that the user of the public instance does not know whether
his/her requests are logged, aggregated and sent or sold to a third party.
their requests are logged, aggregated and sent or sold to a third party.
Also, public instances without proper protection are more vulnerable to abusing
the search service, In this case the external service in exchange returns
......
......@@ -47,9 +47,9 @@ one**::
*Good to know ...*
Eeach container shares the root folder of the repository and the
command ``utils/lxc.sh cmd`` **handles relative path names transparent**,
compare output of::
Each container shares the root folder of the repository and the command
``utils/lxc.sh cmd`` **handles relative path names transparent**, compare output
of::
$ sudo -H ./utils/lxc.sh cmd -- ls -la Makefile
...
......@@ -66,6 +66,7 @@ If there comes the time you want to **get rid off all** the containers and
$ sudo -H ./utils/lxc.sh remove
$ sudo -H ./utils/lxc.sh remove images
.. _lxc.sh install suite:
Install suite
=============
......
This diff is collapsed.
......@@ -50,6 +50,7 @@ result_xpath = '//div[@class="result results_links results_links_deep web-result
url_xpath = './/a[@class="result__a"]/@href'
title_xpath = './/a[@class="result__a"]'
content_xpath = './/a[@class="result__snippet"]'
correction_xpath = '//div[@id="did_you_mean"]//a'
# match query's language to a region code that duckduckgo will accept
......@@ -125,6 +126,11 @@ def response(resp):
'content': content,
'url': res_url})
# parse correction
for correction in eval_xpath(doc, correction_xpath):
# append correction
results.append({'correction': extract_text(correction)})
# return results
return results
......
# SPDX-License-Identifier: AGPL-3.0-or-later
"""
Gigablast (Web)
......@@ -9,121 +10,117 @@
@stable yes
@parse url, title, content
"""
# pylint: disable=missing-function-docstring, invalid-name
import random
import re
from json import loads
from time import time
from lxml.html import fromstring
from searx.poolrequests import get
# from searx import logger
from searx.url_utils import urlencode
from searx.utils import eval_xpath
from searx.poolrequests import get
# engine dependent config
categories = ['general']
paging = True
number_of_results = 10
# gigablast's pagination is totally damaged, don't use it
paging = False
language_support = True
safesearch = True
# search-url
base_url = 'https://gigablast.com/'
search_string = 'search?{query}'\
'&n={number_of_results}'\
'&c=main'\
'&s={offset}'\
'&format=json'\
'&langcountry={lang}'\
'&ff={safesearch}'\
'&rand={rxikd}'
# specific xpath variables
results_xpath = '//response//result'
url_xpath = './/url'
title_xpath = './/title'
content_xpath = './/sum'
supported_languages_url = 'https://gigablast.com/search?&rxikd=1'
extra_param = '' # gigablast requires a random extra parameter
# which can be extracted from the source code of the search page
base_url = 'https://gigablast.com'
# ugly hack: gigablast requires a random extra parameter which can be extracted
# from the source code of the gigablast HTTP client
extra_param = ''
extra_param_path='/search?c=main&qlangcountry=en-us&q=south&s=10'
def parse_extra_param(text):
global extra_param
param_lines = [x for x in text.splitlines() if x.startswith('var url=') or x.startswith('url=url+')]
extra_param = ''
for l in param_lines:
extra_param += l.split("'")[1]
extra_param = extra_param.split('&')[-1]
def init(engine_settings=None):
parse_extra_param(get('http://gigablast.com/search?c=main&qlangcountry=en-us&q=south&s=10').text)
# example:
#
# var uxrl='/search?c=main&qlangcountry=en-us&q=south&s=10&rand=1590740241635&n';
# uxrl=uxrl+'sab=730863287';
#
# extra_param --> "rand=1590740241635&nsab=730863287"
global extra_param # pylint: disable=global-statement
re_var= None
for line in text.splitlines():
if re_var is None and extra_param_path in line:
var = line.split("=")[0].split()[1] # e.g. var --> 'uxrl'
re_var = re.compile(var + "\\s*=\\s*" + var + "\\s*\\+\\s*'" + "(.*)" + "'(.*)")
extra_param = line.split("'")[1][len(extra_param_path):]
continue
if re_var is not None and re_var.search(line):
extra_param += re_var.search(line).group(1)
break
# logger.debug('gigablast extra_param="%s"', extra_param)
def init(engine_settings=None): # pylint: disable=unused-argument
parse_extra_param(get(base_url + extra_param_path).text)
# do search-request
def request(query, params):
print("EXTRAPARAM:", extra_param)
offset = (params['pageno'] - 1) * number_of_results
def request(query, params): # pylint: disable=unused-argument
if params['language'] == 'all':
language = 'xx'
else:
language = params['language'].replace('-', '_').lower()
if language.split('-')[0] != 'zh':
language = language.split('-')[0]
# see API http://www.gigablast.com/api.html#/search
# Take into account, that the API has some quirks ..
if params['safesearch'] >= 1:
safesearch = 1
else:
safesearch = 0
query_args = dict(
c = 'main'
, format = 'json'
, q = query
, dr = 1
, showgoodimages = 0
)
# rxieu is some kind of hash from the search query, but accepts random atm
search_path = search_string.format(query=urlencode({'q': query}),
offset=offset,
number_of_results=number_of_results,
lang=language,
rxikd=int(time() * 1000),
safesearch=safesearch)
if params['language'] and params['language'] != 'all':
query_args['qlangcountry'] = params['language']
query_args['qlang'] = params['language'].split('-')[0]
params['url'] = base_url + search_path + '&' + extra_param
if params['safesearch'] >= 1:
query_args['ff'] = 1
return params
search_url = '/search?' + urlencode(query_args)
params['url'] = base_url + search_url + extra_param
return params
# get response from search-request
def response(resp):
results = []
# parse results
try:
response_json = loads(resp.text)
except:
parse_extra_param(resp.text)
raise Exception('extra param expired, please reload')
response_json = loads(resp.text)
# logger.debug('gigablast returns %s results', len(response_json['results']))
for result in response_json['results']:
# append result
results.append({'url': result['url'],
'title': result['title'],
'content': result['sum']})
# see "Example JSON Output (&format=json)"
# at http://www.gigablast.com/api.html#/search
# return results
return results
# sort out meaningless result
title = result.get('title')
if len(title) < 2:
continue
url = result.get('url')
if len(url) < 9:
continue
content = result.get('sum')
if len(content) < 5:
continue
# extend fields
# get supported languages from their site
def _fetch_supported_languages(resp):
supported_languages = []
dom = fromstring(resp.text)
links = eval_xpath(dom, '//span[@id="menu2"]/a')
for link in links:
href = eval_xpath(link, './@href')[0].split('lang%3A')
if len(href) == 2:
code = href[1].split('_')
if len(code) == 2:
code = code[0] + '-' + code[1].upper()
else:
code = code[0]
supported_languages.append(code)
return supported_languages
subtitle = result.get('title')
if len(subtitle) > 3 and subtitle != title:
title += " - " + subtitle
results.append(dict(
url = url
, title = title
, content = content
))
return results
This diff is collapsed.
"""
Google (Images)
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Google (Images)
:website: https://images.google.com (redirected to subdomain www.)
:provide-api: yes (https://developers.google.com/custom-search/)
:using-api: not the offical, since it needs registration to another service
:results: HTML
:stable: no
:template: images.html
:parse: url, title, content, source, thumbnail_src, img_src
For detailed description of the *REST-full* API see: `Query Parameter
Definitions`_.
.. _admonition:: Content-Security-Policy (CSP)
This engine needs to allow images from the `data URLs`_ (prefixed with the
``data:` scheme).::
@website https://www.google.com
@provide-api yes (https://developers.google.com/custom-search/)
Header set Content-Security-Policy "img-src 'self' data: ;"
.. _Query Parameter Definitions:
https://developers.google.com/custom-search/docs/xml_results#WebSearch_Query_Parameter_Definitions
@using-api no
@results HTML chunks with JSON inside
@stable no
@parse url, title, img_src
"""
from datetime import date, timedelta
from json import loads
from lxml import html
from searx.url_utils import urlencode
from flask_babel import gettext
from searx import logger
from searx.url_utils import urlencode, urlparse
from searx.utils import eval_xpath
from searx.engines.xpath import extract_text
# pylint: disable=unused-import
from searx.engines.google import (
supported_languages_url,
_fetch_supported_languages,
)
# pylint: enable=unused-import
from searx.engines.google import (
get_lang_country,
google_domains,
time_range_dict,
)
logger = logger.getChild('google images')
# engine dependent config
categories = ['images']
paging = True
safesearch = True
paging = False
language_support = True
use_locale_domain = True
time_range_support = True
number_of_results = 100
search_url = 'https://www.google.com/search'\
'?{query}'\
'&tbm=isch'\
'&yv=2'\
'&{search_options}'
time_range_attr = "qdr:{range}"
time_range_custom_attr = "cdr:1,cd_min:{start},cd_max{end}"
time_range_dict = {'day': 'd',
'week': 'w',
'month': 'm'}
safesearch = True
filter_mapping = {
0: 'images',
1: 'active',
2: 'active'
}
# do search-request
def request(query, params):
search_options = {
'ijn': params['pageno'] - 1,
'start': (params['pageno'] - 1) * number_of_results
}
if params['time_range'] in time_range_dict:
search_options['tbs'] = time_range_attr.format(range=time_range_dict[params['time_range']])
elif params['time_range'] == 'year':
now = date.today()
then = now - timedelta(days=365)
start = then.strftime('%m/%d/%Y')
end = now.strftime('%m/%d/%Y')
search_options['tbs'] = time_range_custom_attr.format(start=start, end=end)
def scrap_out_thumbs(dom):
"""Scrap out thumbnail data from <script> tags.
"""
ret_val = dict()
for script in eval_xpath(dom, '//script[contains(., "_setImgSrc(")]'):
_script = script.text
# _setImgSrc('0','data:image\/jpeg;base64,\/9j\/4AAQSkZJR ....');
_thumb_no, _img_data = _script[len("_setImgSrc("):-2].split(",", 1)
_thumb_no = _thumb_no.replace("'", "")
_img_data = _img_data.replace("'", "")
_img_data = _img_data.replace(r"\/", r"/")
ret_val[_thumb_no] = _img_data.replace(r"\x3d", "=")
return ret_val
if safesearch and params['safesearch']:
search_options['safe'] = 'on'
params['url'] = search_url.format(query=urlencode({'q': query}),
search_options=urlencode(search_options))
def request(query, params):
"""Google-Video search request"""
language, country, lang_country = get_lang_country(
# pylint: disable=undefined-variable
params, supported_languages, language_aliases
)
subdomain = 'www.' + google_domains.get(country.upper(), 'google.com')
query_url = 'https://' + subdomain + '/search' + "?" + urlencode({
'q': query,
'tbm': "isch",
'hl': lang_country,
'lr': "lang_" + language,
'ie': "utf8",
'oe': "utf8",
'num': 30,
})
if params['time_range'] in time_range_dict:
query_url += '&' + urlencode({'tbs': 'qdr:' + time_range_dict[params['time_range']]})
if params['safesearch']:
query_url += '&' + urlencode({'safe': filter_mapping[params['safesearch']]})
params['url'] = query_url
logger.debug("query_url --> %s", query_url)
params['headers']['Accept-Language'] = (
"%s,%s;q=0.8,%s;q=0.5" % (lang_country, language, language))
logger.debug(
"HTTP Accept-Language --> %s", params['headers']['Accept-Language'])
params['headers']['Accept'] = (
'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
)
# params['google_subdomain'] = subdomain
return params
# get response from search-request
def response(resp):
"""Get response from google's search request"""
results = []
# detect google sorry
resp_url = urlparse(resp.url)
if resp_url.netloc == 'sorry.google.com' or resp_url.path == '/sorry/IndexRedirect':
raise RuntimeWarning('sorry.google.com')
if resp_url.path.startswith('/sorry'):
raise RuntimeWarning(gettext('CAPTCHA required'))