敏感词(禁用词查询)
【点击查看】低成本上班族靠谱副业好项目 | 拼多多无货源创业7天起店爆单玩法
【点击查看】逆林创业记 | 拼多多电商店铺虚拟类项目新玩法(附完整词表&检测工具)
【点击查看】逆林创业记 | 小白ai写作一键生成爆文速成课
领300个信息差项目,见公众号【逆林创业记】(添加请备注:网站)
本文是一个不完善的例子,是在请求返回时增加一个敏感词过滤器,
之所以说不完善是因为在测试时发现正常的,结构性的部分被过滤掉了敏感词,
请将下面的文字放在UltraEdit中比较,会发现,
被替换成了下面的。
11111111111111111111
4
原始字符串:
<html><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/><meta content="IE=8" http-equiv="X-UA-Compatible"/><link href="/media/resources/dijit/themes/tundra/tundra.css" type="text/css" rel="stylesheet"/><link href="/media/resources/styles/standard.css" media="screen" type="text/css" rel="stylesheet"/><link href="/media/resources/images/favicon.ico" rel="SHORTCUT ICON"/><script type="text/javascript">var djConfig = {parseOnLoad: false, isDebug: false, locale: 'zh-cn'};</script><script type="text/javascript" src="/media/resources/dojo/dojo.js"></script><script type="text/javascript" src="/media/resources/spring/Spring.js"></script><script type="text/javascript" src="/media/resources/spring/Spring-Dojo.js"></script><script type="text/javascript" language="JavaScript">dojo.require("dojo.parser");</script><title>Welcome to media</title></head><body class="tundra spring"><div id="wrapper"><div version="2.0" id="header"><a title="Home" name="Home" href="/media/"><img src="/media/resources/images/banner-graphic.png"/></a></div><div version="2.0" id="menu"></div><div id="main"><div version="2.0"><script type="text/javascript">dojo.require('dijit.TitlePane');</script><div id="_title_title_id"><script type="text/javascript">Spring.addDecoration(new Spring.ElementDecoration({elementId : '_title_title_id', widgetType : 'dijit.TitlePane', widgetAttrs : {title: 'Welcome to media', open: true}})); </script><h3>Welcome to media</h3><p>Spring Roo provides interactive, lightweight and user customizable tooling that enables rapid delivery of high performance enterprise Java applications.</p></div></div><div version="2.0" id="footer"><span><a href="/media/">Home</a></span><span id="language"> | Language: <a title="Switch language to English" href="?lang=en"><img alt="Switch language to English" src="/media/resources/images/en.png" class="flag"/></a> </span><span> | Theme: <a title="standard" href="?theme=standard">standard</a> | <a title="alt" href="?theme=alt">alt</a></span><span><a title="Sponsored by SpringSource" href="http://springsource.com"><img src="/media/resources/images/springsource-logo.png" alt="Sponsored by SpringSource" align="right"/></a></span></div></div></div></body></html>
过滤后的字符串:
<html><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/><meta content="IE=8" http-equiv="X-UA-Compatible"/><link href="/media/resources/dijit/themes/tundra/tundra.css" type="text/css" rel="stylesheet"/><link href="/media/resources/styles/standard.css" media="screen" type="text/css" rel="stylesheet"/><link href="/media/resources/images/favicon.ico" rel="SHORTCUT ICON"/><script type="text/javascript">var djConfig = {parseOnLoad: false, isDebug: false, locale: 'zh-cn'};</script><script type="text/javascript" src="/media/resources/dojo/dojo.js"></script><script type="text/javascript" src="/media/resources/spring/Spring.js"></script><script type="text/javascript" src="/media/resources/spring/Spring-Dojo.js"></script><script type="text/javascript" language="JavaScript">dojo.require("dojo.parser");</script><title>Welcome to media</title></head><body class="tundra spring"><div id="wrapper"><div version="2.0" id="header"><a title="Home" name="Home" href="/media/"><img src="/media/resources/imageX/Xanner-graphic.png"/></a></div><div version="2.0" id="menu"></div><div id="main"><div version="2.0"><script type="text/javascript">dojo.require('dijit.TitlePane');</script><div id="_title_title_id"><script type="text/javascript">Spring.addDecoration(new Spring.ElementDecoration({elementId : '_title_title_id', widgetType : 'dijit.TitlePane', widgetAttrs : {title: 'Welcome to media', open: true}})); </script><h3>Welcome to media</h3><p>Spring Roo provides interactive, lightweight and user customizable tooling that enables rapid delivery of high performance enterprise Java applicatiXXX.</p></div></div><div version="2.0" id="footer"><span><a href="/media/">Home</a></span><span id="language"> | Language: <a title="Switch language to English" href="?lang=en"><img alt="Switch language to English" src="/media/resources/images/en.png" class="flag"/></a> </span><span> | Theme: <a title="standard" href="?theme=standard">standard</a> | <a title="alt" href="?theme=alt">alt</a></span><span><a title="SpXXXored by SpringSource" href="http://springsource.com"><img src="/media/resources/images/springsource-logo.png" alt="SpXXXored by SpringSource" align="right"/></a></span></div></div></div></body></html>
敏感词列表:sb,ons,ons,ons
敏感词列表长度:14
好,下面来说下解决方法:
里面有四部分需要配置,
1,增加一个MySensitiveWordFilter.java的过滤器,
2,增加个敏感词词库sensitive.txt,(与MySensitiveWordFilter.java同目录)
3,增加FilteredResult.java保存过滤情况
4,在web.xml中增加过滤器配置。
5,修改pom设置敏感词,防止非java的文件会在打包时被抛弃。
MySensitiveWordFilter.java和sensitive.txt放置在同一个文件夹下
【1 】
这里是定义敏感词的过滤器
MySensitiveWordFilter.java
package com.hcyg.media.core.util;
import java.io.CharArrayWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.Enumeration;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpServletResponseWrapper;
import com.hcyg.media.core.util.SensitiveWord.FilteredResult;
public class MySensitiveWordFilter implements Filter {
// private WordFilterUtil wordFilterUtil ;
private final String ENCODING = null;
private Node tree = new Node();
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException,
ServletException {
PrintWriter out = response.getWriter();
CharResponseWrapper wrapper = new CharResponseWrapper((HttpServletResponse) response);
chain.doFilter(request, wrapper);
String resStr = wrapper.toString();
FilteredResult res = filterText(resStr, 'X');
System.out.println("11111111111111111111");
System.out.println(res.getLevel());// 检测到的敏感词中最高优先级的值 0为最小
System.out.println("原始字符串:"+res.getOriginalContent());// 原始字符串
System.out.println("过滤后的字符串:"+res.getFilteredContent().toString());// 过滤后的字符串
System.out.println("敏感词列表:"+res.getBadWords());// 敏感词列表
System.out.println("敏感词列表长度:"+res.getBadWords().length());// 敏感词列表长度
String newStr = res.getFilteredContent();
out.println(newStr);
}
class CharResponseWrapper extends HttpServletResponseWrapper {
private CharArrayWriter output;
public String toString() {
return output.toString();
}
public CharResponseWrapper(HttpServletResponse response) {
super(response);
output = new CharArrayWriter();
}
public PrintWriter getWriter() {
return new PrintWriter(output);
}
}
public void destroy() {
}
/**
* 初始化时加载配置文件
*/
@Override
public void init(FilterConfig filterConfig) throws ServletException {
// 读取文件
String app = System.getProperty("user.dir");
InputStream is = null;
try {
// WordFilterUtil.class.getResourceAsStream("/SensitiveWord.txt");
// InputStreamReader reader = new InputStreamReader(new
// FileInputStream(file), ENCODING);
String s_xmlpath = "./sensitive.txt";
is = MySensitiveWordFilter.class.getResourceAsStream(s_xmlpath);
InputStreamReader reader = new InputStreamReader(is, "UTF-8");
Properties prop = new Properties();
prop.load(reader);
Enumeration en = prop.propertyNames();
while (en.hasMoreElements()) {
String word = (String) en.nextElement();
insertWord(word, Integer.valueOf(prop.getProperty(word)).intValue());
}
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
if (is != null) {
try {
is.close();
} catch (IOException e1) {
e.printStackTrace();
}
}
} catch (IOException e) {
e.printStackTrace();
if (is != null) {
try {
is.close();
} catch (IOException e2) {
e.printStackTrace();
}
}
} finally {
if (is != null) {
try {
is.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
private void insertWord(String word, int level) {
Node node = tree;
for (int i = 0; i < word.length(); i++) {
node = node.addChar(word.charAt(i));
}
node.setEnd(true);
node.setLevel(level);
}
private boolean isPunctuationChar(String c) {
String regex = "[\pP\pZ\pS\pM\pC]";
Pattern p = Pattern.compile(regex, 2);
Matcher m = p.matcher(c);
return m.find();
}
private PunctuationOrHtmlFilteredResult filterPunctation(String originalString) {
StringBuffer filteredString = new StringBuffer();
ArrayList charOffsets = new ArrayList();
for (int i = 0; i < originalString.length(); i++) {
String c = String.valueOf(originalString.charAt(i));
if (!isPunctuationChar(c)) {
filteredString.append(c);
charOffsets.add(Integer.valueOf(i));
}
}
PunctuationOrHtmlFilteredResult result = new PunctuationOrHtmlFilteredResult();
result.setOriginalString(originalString);
result.setFilteredString(filteredString);
result.setCharOffsets(charOffsets);
return result;
}
private PunctuationOrHtmlFilteredResult filterPunctationAndHtml(String originalString) {
StringBuffer filteredString = new StringBuffer();
ArrayList charOffsets = new ArrayList();
int i = 0;
for (int k = 0; i < originalString.length(); i++) {
String c = String.valueOf(originalString.charAt(i));
if (originalString.charAt(i) == '<') {
for (k = i + 1; k < originalString.length(); k++) {
if (originalString.charAt(k) == '<') {
k = i;
} else {
if (originalString.charAt(k) == '>') {
break;
}
}
}
i = k;
} else if (!isPunctuationChar(c)) {
filteredString.append(c);
charOffsets.add(Integer.valueOf(i));
}
}
PunctuationOrHtmlFilteredResult result = new PunctuationOrHtmlFilteredResult();
result.setOriginalString(originalString);
result.setFilteredString(filteredString);
result.setCharOffsets(charOffsets);
return result;
}
private FilteredResult filter(PunctuationOrHtmlFilteredResult pohResult, char replacement) {
StringBuffer sentence = pohResult.getFilteredString();
ArrayList charOffsets = pohResult.getCharOffsets();
StringBuffer resultString = new StringBuffer(pohResult.getOriginalString());
StringBuffer badWords = new StringBuffer();
int level = 0;
Node node = tree;
int start = 0;
int end = 0;
for (int i = 0; i < sentence.length(); i++) {
start = i;
end = i;
node = tree;
for (int j = i; j < sentence.length(); j++) {
node = node.findChar(sentence.charAt(j));
if (node == null) {
break;
}
if (node.isEnd()) {
end = j;
level = node.getLevel();
}
}
if (end > start) {
for (int k = start; k <= end; k++) {
resultString.setCharAt(((Integer) charOffsets.get(k)).intValue(), replacement);
}
if (badWords.length() > 0) {
badWords.append(",");
}
badWords.append(sentence.substring(start, end + 1));
i = end;
}
}
FilteredResult result = new FilteredResult();
result.setOriginalContent(pohResult.getOriginalString());
result.setFilteredContent(resultString.toString());
result.setBadWords(badWords.toString());
result.setLevel(Integer.valueOf(level));
return result;
}
public String simpleFilter(String sentence, char replacement) {
StringBuffer sb = new StringBuffer();
Node node = tree;
int start = 0;
int end = 0;
for (int i = 0; i < sentence.length(); i++) {
start = i;
end = i;
node = tree;
for (int j = i; j < sentence.length(); j++) {
node = node.findChar(sentence.charAt(j));
if (node == null) {
break;
}
if (node.isEnd()) {
end = j;
}
}
if (end > start) {
for (int k = start; k <= end; k++) {
sb.append(replacement);
}
i = end;
} else {
sb.append(sentence.charAt(i));
}
}
return sb.toString();
}
public FilteredResult filterText(String originalString, char replacement) {
return filter(filterPunctation(originalString), replacement);
}
public FilteredResult filterHtml(String originalString, char replacement) {
return filter(filterPunctationAndHtml(originalString), replacement);
}
private class PunctuationOrHtmlFilteredResult {
private String originalString;
private StringBuffer filteredString;
private ArrayList charOffsets;
public String getOriginalString() {
return this.originalString;
}
public void setOriginalString(String originalString) {
this.originalString = originalString;
}
public StringBuffer getFilteredString() {
return this.filteredString;
}
public void setFilteredString(StringBuffer filteredString) {
this.filteredString = filteredString;
}
public ArrayList getCharOffsets() {
return this.charOffsets;
}
public void setCharOffsets(ArrayList charOffsets) {
this.charOffsets = charOffsets;
}
}
class Node {
private Map children = new HashMap(0);
private boolean isEnd = false;
private int level = 0;
public Node addChar(char c) {
String cStr = String.valueOf(c);
Node node = (Node) this.children.get(cStr);
if (node == null) {
node = new Node();
this.children.put(cStr, node);
}
return node;
}
public Node findChar(char c) {
String cStr = String.valueOf(c);
return (Node) this.children.get(cStr);
}
public boolean isEnd() {
return this.isEnd;
}
public void setEnd(boolean isEnd) {
this.isEnd = isEnd;
}
public int getLevel() {
return this.level;
}
public void setLevel(int level) {
this.level = level;
}
}
}
【2 】
这里是敏感词库,大家去搜索下吧,找个相同结构的就行
sensitive.txt
加qq=4
敏感词=4
【3】FilteredResult过滤结果
/**
*@Copyright:Copyright (c) 2008 - 2100
*@Company:hcyg
*/
package com.hcyg.media.core.util.SensitiveWord;
/**
*@Title:FilteredResult
*@Description:
*@Author:zp
*@Since:2015-8-1
*@Version:1.0.0
*/
public class FilteredResult
{
private Integer level;
private String filteredContent;
private String badWords;
private String originalContent;
public String getBadWords()
{
return this.badWords;
}
public void setBadWords(String badWords)
{
this.badWords = badWords;
}
public FilteredResult() {}
public FilteredResult(String originalContent, String filteredContent, Integer level, String badWords)
{
this.originalContent = originalContent;
this.filteredContent = filteredContent;
this.level = level;
this.badWords = badWords;
}
public Integer getLevel()
{
return this.level;
}
public void setLevel(Integer level)
{
this.level = level;
}
public String getFilteredContent()
{
return this.filteredContent;
}
public void setFilteredContent(String filteredContent)
{
this.filteredContent = filteredContent;
}
public String getOriginalContent()
{
return this.originalContent;
}
public void setOriginalContent(String originalContent)
{
this.originalContent = originalContent;
}
}
【4】,在web.xml中增加过滤器配置。
<filter>
<filter-name>MySensitiveWordFilter</filter-name>
<filter-class>com.hcyg.media.core.util.MySensitiveWordFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>MySensitiveWordFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
【4 】pom中设置如下,否则 非java的文件会在打包时被抛弃
<build>
<finalName>media</finalName>
<resources>
<resource>
<directory>src/main/java</directory>
<includes>
<include>**/*.properties</include>
<include>**/*.xml</include>
<include>**/*.txt</include>
</includes>
<filtering>false</filtering>
</resource>
<resource>
<directory>src/main/resources</directory>
<includes>
<include>**/*.properties</include>
<include>**/*.xml</include>
</includes>
<filtering>true</filtering>
</resource>
</resources>
...
</build>
文章评论(0)