words-match

words-match组件是基于字典树(DFA)并利用UnixSock通讯和自定义进程实现,开发本组件的目的是帮小伙伴们快速部署内容检测服务。

使用场景

  • 跟文字内容相关的产品都有应用场景。

  • 博客类的文章,评论的检测

  • 聊天内容的检测

  • 对垃圾内容的屏蔽

组件要求

None

安装方法

composer require easyswoole/words-match:1.1.x-dev

仓库地址

easyswoole/words-match

基本使用

准备词库

服务启动的时候会一行一行将数据读出来,每一行的第一列为敏感词,其它列为附属信息

php,是世界上,最好的语言
java
golang
程序员
代码
逻辑

服务注册

<?php
namespace EasySwoole\EasySwoole;

use EasySwoole\EasySwoole\Swoole\EventRegister;
use EasySwoole\EasySwoole\AbstractInterface\Event;
use EasySwoole\Http\Request;
use EasySwoole\Http\Response;
use EasySwoole\WordsMatch\WordsMatchClient;
use EasySwoole\WordsMatch\WordsMatchServer;

class EasySwooleEvent implements Event
{

    public static function initialize()
    {
        // TODO: Implement initialize() method.
        date_default_timezone_set('Asia/Shanghai');
    }

    public static function mainServerCreate(EventRegister $register)
    {
        // TODO: Implement mainServerCreate() method.
        $config = [
            'wordBanks' => [
                'one' => '/Users/xxx/sites/easyswoole/WordsMatch/comment.txt',
                'two' => '/Users/xxx/sites/easyswoole/WordsMatch/comment1.txt'
            ], // 词库地址
            'processNum' => 3, // 进程数
            'maxMem' => 1024, // 每个进程最大占用内存(M)
            'separator' => ',', // 词和其它信息的间隔符
        ];
        WordsMatchServer::getInstance()
            ->setConfig($config)
            ->attachToServer(ServerManager::getInstance()->getSwooleServer());
    }

    public static function onRequest(Request $request, Response $response): bool
    {
        // TODO: Implement onRequest() method.
        return true;
    }

    public static function afterRequest(Request $request, Response $response): void
    {
        // TODO: Implement afterAction() method.
    }
}

客户端使用

<?php

namespace App\HttpController;

use EasySwoole\Http\AbstractInterface\Controller;
use EasySwoole\WordsMatch\WordsMatchClient;

class Index extends Controller
{

    function append()
    {
        WordsMatchClient::getInstance()
                    ->setWordBanks(['one']) // 必须指定词库
                    ->append('测试词1');
    }

    function detect()
    {
        $content = 'php是世界上最好的语言';
        // 检测内容,不指定词库则检测所有词库
        WordsMatchClient::getInstance()
                    ->setWordBanks(['one']) // 不指定词库则检测所有词库
                    ->detect($content);
    }

    function remove()
    {
        WordsMatchClient::getInstance()
                        ->setWordBanks(['one']) // 必须指定词库
                        ->remove('测试词1');
    }

}

压测结果

对此组件分别进行1.5万、13万等级的词库测试,服务默认开启3个进程。

仅做参考,具体还以线上验证

电脑配置

MacBook Air (13-inch, 2017)
处理器 1.8 GHz Intel Core i5
内存 8 GB 1600 MHz DDR3

1.5万词

并发10总请求数100
10 100
Concurrency Level:      10
Time taken for tests:   0.067 seconds
Complete requests:      100
Failed requests:        0
Non-2xx responses:      100
Total transferred:      17300 bytes
HTML transferred:       2600 bytes
Requests per second:    1492.49 [#/sec] (mean)
Time per request:       6.700 [ms] (mean)
Time per request:       0.670 [ms] (mean, across all concurrent requests)
Transfer rate:          252.15 [Kbytes/sec] received
并发100总请求数1000
Concurrency Level:      100
Time taken for tests:   0.239 seconds
Complete requests:      1000
Failed requests:        0
Non-2xx responses:      1000
Total transferred:      173000 bytes
HTML transferred:       26000 bytes
Requests per second:    4189.17 [#/sec] (mean)
Time per request:       23.871 [ms] (mean)
Time per request:       0.239 [ms] (mean, across all concurrent requests)
Transfer rate:          707.74 [Kbytes/sec] received

13万词

并发10总请求数100
Concurrency Level:      10
Time taken for tests:   0.057 seconds
Complete requests:      100
Failed requests:        0
Non-2xx responses:      100
Total transferred:      17300 bytes
HTML transferred:       2600 bytes
Requests per second:    1751.71 [#/sec] (mean)
Time per request:       5.709 [ms] (mean)
Time per request:       0.571 [ms] (mean, across all concurrent requests)
Transfer rate:          295.94 [Kbytes/sec] received
并发100总请求数1000
Concurrency Level:      100
Time taken for tests:   0.225 seconds
Complete requests:      1000
Failed requests:        0
Non-2xx responses:      1000
Total transferred:      173000 bytes
HTML transferred:       26000 bytes
Requests per second:    4444.84 [#/sec] (mean)
Time per request:       22.498 [ms] (mean)
Time per request:       0.225 [ms] (mean, across all concurrent requests)
Transfer rate:          750.93 [Kbytes/sec] received