敏感词(禁用词查询)


【点击查看】低成本上班族靠谱副业好项目 | 拼多多无货源创业7天起店爆单玩法

【点击查看】逆林创业记 | 拼多多电商店铺虚拟类项目新玩法(附完整词表&检测工具)

【点击查看】逆林创业记 | 小白ai写作一键生成爆文速成课

领300个信息差项目,见公众号【逆林创业记】(添加请备注:网站)

本文是一个不完善的例子,是在请求返回时增加一个敏感词过滤器,

之所以说不完善是因为在测试时发现正常的,结构性的部分被过滤掉了敏感词,

请将下面的文字放在UltraEdit中比较,会发现,

被替换成了下面的。

11111111111111111111
4
原始字符串:

<html><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/><meta content="IE=8" http-equiv="X-UA-Compatible"/><link href="/media/resources/dijit/themes/tundra/tundra.css" type="text/css" rel="stylesheet"/><link href="/media/resources/styles/standard.css" media="screen" type="text/css" rel="stylesheet"/><link href="/media/resources/images/favicon.ico" rel="SHORTCUT ICON"/><script type="text/javascript">var djConfig = {parseOnLoad: false, isDebug: false, locale: 'zh-cn'};</script><script type="text/javascript" src="/media/resources/dojo/dojo.js"></script><script type="text/javascript" src="/media/resources/spring/Spring.js"></script><script type="text/javascript" src="/media/resources/spring/Spring-Dojo.js"></script><script type="text/javascript" language="JavaScript">dojo.require("dojo.parser");</script><title>Welcome to media</title></head><body class="tundra spring"><div id="wrapper"><div version="2.0" id="header"><a title="Home" name="Home" href="/media/"><img src="/media/resources/images/banner-graphic.png"/></a></div><div version="2.0" id="menu"></div><div id="main"><div version="2.0"><script type="text/javascript">dojo.require('dijit.TitlePane');</script><div id="_title_title_id"><script type="text/javascript">Spring.addDecoration(new Spring.ElementDecoration({elementId : '_title_title_id', widgetType : 'dijit.TitlePane', widgetAttrs : {title: 'Welcome to media', open: true}})); </script><h3>Welcome to media</h3><p>Spring Roo provides interactive, lightweight and user customizable tooling that enables rapid delivery of high performance enterprise Java applications.</p></div></div><div version="2.0" id="footer"><span><a href="/media/">Home</a></span><span id="language"> | Language: <a title="Switch language to English" href="?lang=en"><img alt="Switch language to English" src="/media/resources/images/en.png" class="flag"/></a> </span><span> | Theme: <a title="standard" href="?theme=standard">standard</a> | <a title="alt" href="?theme=alt">alt</a></span><span><a title="Sponsored by SpringSource" href="http://springsource.com"><img src="/media/resources/images/springsource-logo.png" alt="Sponsored by SpringSource" align="right"/></a></span></div></div></div></body></html>
过滤后的字符串:

<html><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/><meta content="IE=8" http-equiv="X-UA-Compatible"/><link href="/media/resources/dijit/themes/tundra/tundra.css" type="text/css" rel="stylesheet"/><link href="/media/resources/styles/standard.css" media="screen" type="text/css" rel="stylesheet"/><link href="/media/resources/images/favicon.ico" rel="SHORTCUT ICON"/><script type="text/javascript">var djConfig = {parseOnLoad: false, isDebug: false, locale: 'zh-cn'};</script><script type="text/javascript" src="/media/resources/dojo/dojo.js"></script><script type="text/javascript" src="/media/resources/spring/Spring.js"></script><script type="text/javascript" src="/media/resources/spring/Spring-Dojo.js"></script><script type="text/javascript" language="JavaScript">dojo.require("dojo.parser");</script><title>Welcome to media</title></head><body class="tundra spring"><div id="wrapper"><div version="2.0" id="header"><a title="Home" name="Home" href="/media/"><img src="/media/resources/imageX/Xanner-graphic.png"/></a></div><div version="2.0" id="menu"></div><div id="main"><div version="2.0"><script type="text/javascript">dojo.require('dijit.TitlePane');</script><div id="_title_title_id"><script type="text/javascript">Spring.addDecoration(new Spring.ElementDecoration({elementId : '_title_title_id', widgetType : 'dijit.TitlePane', widgetAttrs : {title: 'Welcome to media', open: true}})); </script><h3>Welcome to media</h3><p>Spring Roo provides interactive, lightweight and user customizable tooling that enables rapid delivery of high performance enterprise Java applicatiXXX.</p></div></div><div version="2.0" id="footer"><span><a href="/media/">Home</a></span><span id="language"> | Language: <a title="Switch language to English" href="?lang=en"><img alt="Switch language to English" src="/media/resources/images/en.png" class="flag"/></a> </span><span> | Theme: <a title="standard" href="?theme=standard">standard</a> | <a title="alt" href="?theme=alt">alt</a></span><span><a title="SpXXXored by SpringSource" href="http://springsource.com"><img src="/media/resources/images/springsource-logo.png" alt="SpXXXored by SpringSource" align="right"/></a></span></div></div></div></body></html>
敏感词列表:sb,ons,ons,ons
敏感词列表长度:14

好,下面来说下解决方法:

里面有四部分需要配置,

1,增加一个MySensitiveWordFilter.java的过滤器,

2,增加个敏感词词库sensitive.txt,(与MySensitiveWordFilter.java同目录)

3,增加FilteredResult.java保存过滤情况

4,在web.xml中增加过滤器配置。

5,修改pom设置敏感词,防止非java的文件会在打包时被抛弃。

MySensitiveWordFilter.java和sensitive.txt放置在同一个文件夹下

【1 】

这里是定义敏感词的过滤器

MySensitiveWordFilter.java

package com.hcyg.media.core.util;
import java.io.CharArrayWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.Enumeration;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpServletResponseWrapper;
import com.hcyg.media.core.util.SensitiveWord.FilteredResult;
public class MySensitiveWordFilter implements Filter {
    // private WordFilterUtil wordFilterUtil ;
    private final String ENCODING = null;
    private Node tree = new Node();
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException,
            ServletException {
        PrintWriter out = response.getWriter();
        CharResponseWrapper wrapper = new CharResponseWrapper((HttpServletResponse) response);
        chain.doFilter(request, wrapper);
        String resStr = wrapper.toString();
        FilteredResult res = filterText(resStr, 'X');
        System.out.println("11111111111111111111");
        System.out.println(res.getLevel());// 检测到的敏感词中最高优先级的值 0为最小
        System.out.println("原始字符串:"+res.getOriginalContent());// 原始字符串
        System.out.println("过滤后的字符串:"+res.getFilteredContent().toString());// 过滤后的字符串
        System.out.println("敏感词列表:"+res.getBadWords());// 敏感词列表
        System.out.println("敏感词列表长度:"+res.getBadWords().length());// 敏感词列表长度
        String newStr = res.getFilteredContent();
        out.println(newStr);
    }
    class CharResponseWrapper extends HttpServletResponseWrapper {
        private CharArrayWriter output;
        public String toString() {
            return output.toString();
        }
        public CharResponseWrapper(HttpServletResponse response) {
            super(response);
            output = new CharArrayWriter();
        }
        public PrintWriter getWriter() {
            return new PrintWriter(output);
        }
    }
    public void destroy() {
    }
    /**
     * 初始化时加载配置文件
     */
    @Override
    public void init(FilterConfig filterConfig) throws ServletException {
        // 读取文件
        String app = System.getProperty("user.dir");
        InputStream is = null;
        try {
            // WordFilterUtil.class.getResourceAsStream("/SensitiveWord.txt");
            // InputStreamReader reader = new InputStreamReader(new
            // FileInputStream(file), ENCODING);
            String s_xmlpath = "./sensitive.txt";
            is = MySensitiveWordFilter.class.getResourceAsStream(s_xmlpath);
            InputStreamReader reader = new InputStreamReader(is, "UTF-8");
            Properties prop = new Properties();
            prop.load(reader);
            Enumeration en = prop.propertyNames();
            while (en.hasMoreElements()) {
                String word = (String) en.nextElement();
                insertWord(word, Integer.valueOf(prop.getProperty(word)).intValue());
            }
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
            if (is != null) {
                try {
                    is.close();
                } catch (IOException e1) {
                    e.printStackTrace();
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
            if (is != null) {
                try {
                    is.close();
                } catch (IOException e2) {
                    e.printStackTrace();
                }
            }
        } finally {
            if (is != null) {
                try {
                    is.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }
    private void insertWord(String word, int level) {
        Node node = tree;
        for (int i = 0; i < word.length(); i++) {
            node = node.addChar(word.charAt(i));
        }
        node.setEnd(true);
        node.setLevel(level);
    }
    private boolean isPunctuationChar(String c) {
        String regex = "[\pP\pZ\pS\pM\pC]";
        Pattern p = Pattern.compile(regex, 2);
        Matcher m = p.matcher(c);
        return m.find();
    }
    private PunctuationOrHtmlFilteredResult filterPunctation(String originalString) {
        StringBuffer filteredString = new StringBuffer();
        ArrayList charOffsets = new ArrayList();
        for (int i = 0; i < originalString.length(); i++) {
            String c = String.valueOf(originalString.charAt(i));
            if (!isPunctuationChar(c)) {
                filteredString.append(c);
                charOffsets.add(Integer.valueOf(i));
            }
        }
        PunctuationOrHtmlFilteredResult result = new PunctuationOrHtmlFilteredResult();
        result.setOriginalString(originalString);
        result.setFilteredString(filteredString);
        result.setCharOffsets(charOffsets);
        return result;
    }
    private PunctuationOrHtmlFilteredResult filterPunctationAndHtml(String originalString) {
        StringBuffer filteredString = new StringBuffer();
        ArrayList charOffsets = new ArrayList();
        int i = 0;
        for (int k = 0; i < originalString.length(); i++) {
            String c = String.valueOf(originalString.charAt(i));
            if (originalString.charAt(i) == '<') {
                for (k = i + 1; k < originalString.length(); k++) {
                    if (originalString.charAt(k) == '<') {
                        k = i;
                    } else {
                        if (originalString.charAt(k) == '>') {
                            break;
                        }
                    }
                }
                i = k;
            } else if (!isPunctuationChar(c)) {
                filteredString.append(c);
                charOffsets.add(Integer.valueOf(i));
            }
        }
        PunctuationOrHtmlFilteredResult result = new PunctuationOrHtmlFilteredResult();
        result.setOriginalString(originalString);
        result.setFilteredString(filteredString);
        result.setCharOffsets(charOffsets);
        return result;
    }
    private   FilteredResult filter(PunctuationOrHtmlFilteredResult pohResult, char replacement) {
        StringBuffer sentence = pohResult.getFilteredString();
        ArrayList charOffsets = pohResult.getCharOffsets();
        StringBuffer resultString = new StringBuffer(pohResult.getOriginalString());
        StringBuffer badWords = new StringBuffer();
        int level = 0;
        Node node = tree;
        int start = 0;
        int end = 0;
        for (int i = 0; i < sentence.length(); i++) {
            start = i;
            end = i;
            node = tree;
            for (int j = i; j < sentence.length(); j++) {
                node = node.findChar(sentence.charAt(j));
                if (node == null) {
                    break;
                }
                if (node.isEnd()) {
                    end = j;
                    level = node.getLevel();
                }
            }
            if (end > start) {
                for (int k = start; k <= end; k++) {
                    resultString.setCharAt(((Integer) charOffsets.get(k)).intValue(), replacement);
                }
                if (badWords.length() > 0) {
                    badWords.append(",");
                }
                badWords.append(sentence.substring(start, end + 1));
                i = end;
            }
        }
        FilteredResult result = new FilteredResult();
        result.setOriginalContent(pohResult.getOriginalString());
        result.setFilteredContent(resultString.toString());
        result.setBadWords(badWords.toString());
        result.setLevel(Integer.valueOf(level));
        return result;
    }
    public   String simpleFilter(String sentence, char replacement) {
        StringBuffer sb = new StringBuffer();
        Node node = tree;
        int start = 0;
        int end = 0;
        for (int i = 0; i < sentence.length(); i++) {
            start = i;
            end = i;
            node = tree;
            for (int j = i; j < sentence.length(); j++) {
                node = node.findChar(sentence.charAt(j));
                if (node == null) {
                    break;
                }
                if (node.isEnd()) {
                    end = j;
                }
            }
            if (end > start) {
                for (int k = start; k <= end; k++) {
                    sb.append(replacement);
                }
                i = end;
            } else {
                sb.append(sentence.charAt(i));
            }
        }
        return sb.toString();
    }
    public FilteredResult filterText(String originalString, char replacement) {
        return filter(filterPunctation(originalString), replacement);
    }
    public FilteredResult filterHtml(String originalString, char replacement) {
        return filter(filterPunctationAndHtml(originalString), replacement);
    }
    private class PunctuationOrHtmlFilteredResult {
        private String originalString;
        private StringBuffer filteredString;
        private ArrayList charOffsets;
        public String getOriginalString() {
            return this.originalString;
        }
        public void setOriginalString(String originalString) {
            this.originalString = originalString;
        }
        public StringBuffer getFilteredString() {
            return this.filteredString;
        }
        public void setFilteredString(StringBuffer filteredString) {
            this.filteredString = filteredString;
        }
        public ArrayList getCharOffsets() {
            return this.charOffsets;
        }
        public void setCharOffsets(ArrayList charOffsets) {
            this.charOffsets = charOffsets;
        }
    }
    class Node {
        private Map children = new HashMap(0);
        private boolean isEnd = false;
        private int level = 0;
        public Node addChar(char c) {
            String cStr = String.valueOf(c);
            Node node = (Node) this.children.get(cStr);
            if (node == null) {
                node = new Node();
                this.children.put(cStr, node);
            }
            return node;
        }
        public Node findChar(char c) {
            String cStr = String.valueOf(c);
            return (Node) this.children.get(cStr);
        }
        public boolean isEnd() {
            return this.isEnd;
        }
        public void setEnd(boolean isEnd) {
            this.isEnd = isEnd;
        }
        public int getLevel() {
            return this.level;
        }
        public void setLevel(int level) {
            this.level = level;
        }
    }
}

【2 】

这里是敏感词库,大家去搜索下吧,找个相同结构的就行

sensitive.txt

加qq=4
敏感词=4

【3】FilteredResult过滤结果

/**
 *@Copyright:Copyright (c) 2008 - 2100
 *@Company:hcyg
 */
package com.hcyg.media.core.util.SensitiveWord;
/**
 *@Title:FilteredResult
 *@Description:
 *@Author:zp
 *@Since:2015-8-1
 *@Version:1.0.0
 */ 
public class FilteredResult
{
  private Integer level;
  private String filteredContent;
  private String badWords;
  private String originalContent;
  public String getBadWords()
  {
    return this.badWords;
  }
  public void setBadWords(String badWords)
  {
    this.badWords = badWords;
  }
  public FilteredResult() {}
  public FilteredResult(String originalContent, String filteredContent, Integer level, String badWords)
  {
    this.originalContent = originalContent;
    this.filteredContent = filteredContent;
    this.level = level;
    this.badWords = badWords;
  }
  public Integer getLevel()
  {
    return this.level;
  }
  public void setLevel(Integer level)
  {
    this.level = level;
  }
  public String getFilteredContent()
  {
    return this.filteredContent;
  }
  public void setFilteredContent(String filteredContent)
  {
    this.filteredContent = filteredContent;
  }
  public String getOriginalContent()
  {
    return this.originalContent;
  }
  public void setOriginalContent(String originalContent)
  {
    this.originalContent = originalContent;
  }
}

【4】,在web.xml中增加过滤器配置。

    <filter>
        <filter-name>MySensitiveWordFilter</filter-name>
        <filter-class>com.hcyg.media.core.util.MySensitiveWordFilter</filter-class>
    </filter>
    <filter-mapping>
        <filter-name>MySensitiveWordFilter</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>

【4 】pom中设置如下,否则 非java的文件会在打包时被抛弃

<build>
        <finalName>media</finalName>
        <resources>
            <resource>
                <directory>src/main/java</directory>
                <includes>
                    <include>**/*.properties</include>
                    <include>**/*.xml</include>
                    <include>**/*.txt</include>
                </includes>
                
                <filtering>false</filtering>
            </resource>
            <resource>
                <directory>src/main/resources</directory>
                <includes>
                    <include>**/*.properties</include>
                    <include>**/*.xml</include>
                </includes>
                <filtering>true</filtering>
            </resource>
        </resources>
        ...
    </build>

版权声明:本文内容由互联网用户贡献,该文观点仅代表作者本人。本站不拥有所有权,不承担相关法律责任。如发现有侵权/违规的内容, 联系QQ3361245237,本站将立刻清除。