首页
壁纸
留言板
友链
更多
统计归档
Search
1
TensorBoard:训练日志及网络结构可视化工具
12,725 阅读
2
主板开机跳线接线图【F_PANEL接线图】
11,465 阅读
3
移动光猫获取超级密码&开启公网ipv6
9,622 阅读
4
Linux使用V2Ray 原生客户端
7,821 阅读
5
NVIDIA 显卡限制功率
3,629 阅读
好物分享
实用教程
linux使用
wincmd
学习笔记
mysql
java学习
nginx
综合面试题
大数据
网络知识
linux
k8s
放码过来
python
javascript
java
opencv
蓝桥杯
leetcode
深度学习
开源模型
相关知识
数据集和工具
模型轻量化
语音识别
计算机视觉
杂七杂八
硬件科普
主机安全
嵌入式设备
其它
bug处理
登录
/
注册
Search
标签搜索
好物分享
学习笔记
linux
MySQL
nvidia
typero
内网穿透
webdav
vps
java
cudann
gcc
cuda
树莓派
CNN
图像去雾
ssh安全
nps
暗通道先验
阿里云
jupiter
累计撰写
360
篇文章
累计收到
121
条评论
首页
栏目
好物分享
实用教程
linux使用
wincmd
学习笔记
mysql
java学习
nginx
综合面试题
大数据
网络知识
linux
k8s
放码过来
python
javascript
java
opencv
蓝桥杯
leetcode
深度学习
开源模型
相关知识
数据集和工具
模型轻量化
语音识别
计算机视觉
杂七杂八
硬件科普
主机安全
嵌入式设备
其它
bug处理
页面
壁纸
留言板
友链
统计归档
搜索到
360
篇与
的结果
2021-07-01
ubuntu 安装 Realtek8813 系列无线网卡驱动
ubuntu 安装 Realtek8813 系列无线网卡驱动1.确认网卡型号$ lsblk Bus 001 Device 017: ID 0bda:8813 Realtek Semiconductor Corp.2.下载驱动This driver works ok: https://github.com/zebulon2/rtl8814au3.安装驱动git clone https://github.com/zebulon2/rtl8814au.git cd rtl8814au make sudo make install sudo modprobe 8814au参考资料Alfa AWUS1900 driver support:https://askubuntu.com/questions/981638/alfa-awus1900-driver-support
2021年07月01日
624 阅读
0 评论
0 点赞
2021-06-30
Ubuntu 16.04配置VNC进行远程桌面连接
1、安装sudo apt-get install xfce4 vnc4server xrdp 2、启动vncserver,初始化vncserver #启动vncserver,第一次需要输入设置登录密码如果密码忘记了,可以进去~/.vnc/目录删除password文件即可。3、修改配置文件xstartupvim ~/.vnc/xstartup在其中替换成如下的内容:#!/bin/sh # Uncomment the following two lines for normal desktop: # unset SESSION_MANAGER # exec /etc/X11/xinit/xinitrc #[ -x /etc/vnc/xstartup ] && exec /etc/vnc/xstartup #[ -r $HOME/.Xresources ] && xrdb $HOME/.Xresources #xsetroot -solid grey #vncconfig -iconic & #x-terminal-emulator -geometry 80x24+10+10 -ls -title "$VNCDESKTOP Desktop" & #x-window-manager & unset SESSION_MANAGER unset DBUS_SESSION_BUS_ADDRESS [ -x /etc/vnc/xstartup ] && exec /etc/vnc/xstartup [ -r $HOME/.Xresources ] && xrdb $HOME/.Xresources vncconfig -iconic & xfce4-session & 4、重新启动vncserver与xrdpsudo vncserver -kill :1 #杀死关闭vncserver vncserver #vncserver再次重启 sudo service xrdp restart #重新启动xrdp 5、连接参考资料Ubuntu 16.04配置VNC进行远程桌面连接(示例代码):https://www.136.la/nginx/show-36314.html
2021年06月30日
715 阅读
0 评论
0 点赞
2021-06-29
Jetson nano 安装TensorFlow GPU
Jetson nano 安装TensorFlow GPU1.Prerequisites and DependenciesBefore you install TensorFlow for Jetson, ensure you:Install JetPack on your Jetson device.Install system packages required by TensorFlow:$ sudo apt-get update $ sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev liblapack-dev libblas-dev gfortranInstall and upgrade pip3.$ sudo apt-get install python3-pip $ sudo pip3 install -U pip testresources setuptools==49.6.0 Install the Python package dependencies.$ sudo pip3 install -U numpy==1.19.4 future==0.18.2 mock==3.0.5 h5py==2.10.0 keras_preprocessing==1.1.1 keras_applications==1.0.8 gast==0.2.2 futures protobuf pybind112.Installing TensorFlowNote: As of the 20.02 TensorFlow release, the package name has changed from tensorflow-gpu to tensorflow. See the section on Upgrading TensorFlow for more information.Install TensorFlow using the pip3 command. This command will install the latest version of TensorFlow compatible with JetPack 4.5.$ sudo pip3 install --pre --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v45 tensorflowNote: TensorFlow version 2 was recently released and is not fully backward compatible with TensorFlow 1.x. If you would prefer to use a TensorFlow 1.x package, it can be installed by specifying the TensorFlow version to be less than 2, as in the following command:$ sudo pip3 install --pre --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v45 ‘tensorflow<2’If you want to install the latest version of TensorFlow supported by a particular version of JetPack, issue the following command:$ sudo pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v$JP_VERSION tensorflowWhere:JP_VERSIONThe major and minor version of JetPack you are using, such as 42 for JetPack 4.2.2 or 33 for JetPack 3.3.1.If you want to install a specific version of TensorFlow, issue the following command:$ sudo pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v$JP_VERSION tensorflow==$TF_VERSION+nv$NV_VERSIONWhere:JP_VERSIONThe major and minor version of JetPack you are using, such as 42 for JetPack 4.2.2 or 33 for JetPack 3.3.1.TF_VERSIONThe released version of TensorFlow, for example, 1.13.1.NV_VERSIONThe monthly NVIDIA container version of TensorFlow, for example, 19.01.Note: The version of TensorFlow you are trying to install must be supported by the version of JetPack you are using. Also, the package name may be different for older releases. See the TensorFlow For Jetson Platform Release Notes for a list of some recent TensorFlow releases with their corresponding package names, as well as NVIDIA container and JetPack compatibility.For example, to install TensorFlow 1.13.1 as of the 19.03 release, the command would look similar to the following:$ sudo pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v42 tensorflow-gpu==1.13.1+nv19.3Tensorflow-GPU测试是否可用Tensorflow-gpu 1.x.x, 如Tensorflow-gpu 1.2.0, 可使用以下代码import tensorflow as tf tf.test.is_gpu_available()Tensoeflow-gpu 2.x.x,如Tensorflow-gpu 2.2.0, 可使用以下代码import tensorflow as tf tf.config.list_physical_devices('GPU')参考资料Installing TensorFlow For Jetson Platform:https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.htmlTensorflow-GPU测试是否可用:https://www.jianshu.com/p/8eb7e03a9163
2021年06月29日
686 阅读
0 评论
0 点赞
2021-06-26
边缘计算设备清单
1.jetson nano购买渠道及报价京东1-丽台京东自营旗舰店链接:https://item.jd.com/100007523969.html#crumb-wrap价格:859京东2-风火轮智能硬件专营店链接:https://item.jd.com/43596671885.html#none价格:899微雪电子链接:https://www.waveshare.net/shop/Jetson-Nano-Developer-Kit-B01.htm价格:782.50参数信息项目Jetson NanoAI算力472 GFLOPsGPU128-core MaxwellCPUQuad-core ARM A57 @ 1.43 GHz内存4 GB 64-bit LPDDR4 25.6 GB/s存储micro SD卡 (须另购,可选购:Micro SD Card 64GB)视频编码4K @ 30 or 4x 1080p @ 30 or 9x 720p @ 30 (H.264/H.265)视频解码4K @ 60 or 2x 4K @ 30 or 8x 1080p @ 30 or 18x 720p @ 30 (H.264/H.265)摄像头2x MIPI CSI-2 DPHY lanes联网千兆以太网,M.2 Key E接口外扩 (可外接: AC8265双模网卡 )显示HDMI 和 DP显示接口USB4x USB 3.0,USB 2.0 Micro-B扩展接口GPIO,I2C,I2S,SPI,UART其他260-pin 连接器功耗5W / 10W2.Jetson TX2购买渠道及报价京东1-中天晨拓数码专营店链接:https://item.jd.com/57288701121.html#crumb-wrap价格:4100京东2-风火轮智能硬件专营店链接:https://item.jd.com/42504341472.html#crumb-wrap价格:4100微雪电子链接:https://www.waveshare.net/shop/Jetson-TX2-Developer-Kit.htm价格:4189.50参数信息项目Jetson TX2AI算力1.3 TFLOPsGPU256-core NVIDIA Pascal™ GPUCPUDual-Core NVIDIA Denver 2 64-Bit CPU and Quad-Core ARM® Cortex®-A57 MPCore内存8GB 128-bit LPDDR4 Memory存储32GB eMMC 5.1视频视频编码:4K x 2K 60 Hz (HEVC) 视频解码:4K x 2K 60 Hz (12-bit support)网络千兆以太网,WIFI,蓝牙CSI12x CSI-2 D-PHY 1.2(Up to 30 GB/s)显示Two Multi-Mode DP 1.2 eDP 1.4 HDMI 2.0 Two 1x4 DSI (1.5Gbps/lane)PCIEGen 2 or 1x4 + 1x1 OR 2x1 + 1x2功耗7.5W / 15W3.Jetson Xavier NX购买渠道及报价京东-英伟达比格专卖店链接:https://item.jd.com/10023731874172.html价格:3899.00微雪电子链接:https://www.waveshare.net/shop/Jetson-Xavier-NX-Developer-Kit.htm价格:3675参数信息项目Jetson Xavier NXAI算力21 TFLOPsGPUNVIDIA Volta architecture with 384 NVIDIA CUDA cores and 48 Tensor coresCPU6-core NVIDIA Carmel ARM v8.2 64-bit CPU 6 MB L2 + 4 MB L3 6MB L2 + 4MB L3DL 加速器2x NVDLA Engines视觉加速器7-Way VLIW Vision Processor内存8 GB 128-bit LPDDR4x @ 51.2GB/s存储空间需另购 Micro SD视频编码2x 4K @ 30 or 6x 1080p @ 60 or 14x 1080p @ 30 (H.265/H.264)视频解码2x 4K @ 60 or 4x 4K @ 30 or 12x 1080p @ 60 or 32x 1080p @ 30 (H.265) 2x 4K @ 30 or 6x 1080p @ 60 or 16x 1080p @ 30 (H.264)摄像头2x MIPI CSI-2 DPHY lanes网络Gigabit Ethernet, M.2 Key E (WiFi/BT included), M.2 Key M (NVMe)显示接口HDMI and display portUSB4x USB 3.1, USB 2.0 Micro-B其它GPIO, I 2 C, I 2 S, SPI, UART规格尺寸103 x 90.5 x 34.66 mm功耗未给出4.Jetson AGX Xavier购买渠道及报价京东1-中天晨拓数码专营店链接:https://item.jd.com/35577062547.html价格:6658.00京东2-丽台京东自营旗舰店链接:https://item.jd.com/100007523939.html价格: 5799.00微雪电子链接:https://www.waveshare.net/shop/Jetson-AGX-Xavier-Developer-Kit.htm价格:5596.50参数信息项目Jetson AGX XavierAI算力32 TFLOPsGPU512 核 Volta GPU (具有 64 个 Tensor 核心) 11 TFLOPS (FP16) 22 TOPS (INT8)CPU8 核 ARM v8.2 64 位 CPU、8 MB L2 + 4MB L3内存32GB 256-Bit LPDDR4x or 137GB/s存储32GB eMMC 5.1DL加速器(2x) NVDLA 引擎 5 TFLOPS (FP16), 10 TOPS (INT8)视觉加速器7通道 VLIW 视觉处理器视频编解码(2x) 4Kp60 or HEVC/(2x) 4Kp60 or 12-Bit Support尺寸105 mm x 105 mm x 65 mm板载模块Jetson AGX Xavier功耗10W/15W/30W5.海康威视4k摄像头购买渠道及报价京东1-海康威视京东自营旗舰店链接:https://item.jd.com/35577062547.html价格:6186.海康威视1k摄像头购买渠道及报价京东1-海康威视京东自营旗舰店链接:https://item.jd.com/100008757357.html价格:286
2021年06月26日
948 阅读
0 评论
0 点赞
2021-06-18
Ubuntu的apt-get代理设置
Ubuntu的apt-get代理设置1. 环境变量方法设置环境变量,下面是临时设置export http_proxy=http://127.0.0.1:8000 sudo apt-get update2.设置apt-get的配置修改/etc/apt/apt.conf(或者/etc/envrionment),增加Acquire::http::proxy "http://127.0.0.1:8000/"; Acquire::ftp::proxy "ftp://127.0.0.1:8000/"; Acquire::https::proxy "https://127.0.0.1:8000/";3.在命令行临时带入这是我最喜欢的方法,毕竟apt不是时时刻刻都用的在命令行后面增加-o选项sudo apt-get -o Acquire::http::proxy="http://127.0.0.1:8000/" update
2021年06月18日
1,024 阅读
0 评论
0 点赞
2021-04-10
快速了解期刊分类
快速了解期刊分类外文期刊序号数据索引全称所属机构成立机构成立时间核心介绍1SCI科学引文索引Science Citation Index科睿唯安Clarivate Analytics美国科学情报研究所(ISI)1964①收录了自然科学、工程技术、生物医学等多各学科期刊②涵盖了各个研究领域最具影响力的超过9000多种核心学术期刊2SSCI社会科学引文索引Social Science Citation Index科睿唯安Clarivate Analytics美国科学情报研究所(ISI)1973内容覆盖包含人类学、法律、经济、历史、地理、心理学等55个领域期刊数量有约3500种3A&HCI艺术与人文科学引文索引Arts&Humanities Citation Index科睿唯安Clarivate Analytics美国科学情报研究所(ISI)1978是艺术与人文科学领域重要的期刊文摘索引数据库,收录考古学、建筑学、艺术、文学、哲学、宗教、历史等社会科学领域的1800余种期刊4ESCI新兴来源引文索引Emerging Sources Citations Index科睿唯安Clarivate Analytics美国科学情报研究所(ISI)2015收录了一批优质的新杂志进入观察期,帮助科研人员了解学术研究的新兴趋势,不定期更新5CPCI科技会议录索引Conference Proceedings Citation Index科睿唯安Clarivate Analytics美国科学情报研究所(ISI)1978①收录自1990年以来每年近10,000个国际科技学术会议所出版的会议论文②提供自1997年以来的会议录论文的摘要,每周更新6EI工程索引The Engineering Index爱思唯尔Elsevier Engineering Information Inc美国工程信息公司1884①全球最全面的工程领域二次文献数据库②涵盖一系列土木工程、建筑工程、交通运输、应用科学等领域高品质的文献资源7CA日本科学技术振兴机构数据库Japan Science&Technology Corportion美国化学学会化学文摘社美国化学学会1907①世界最大的化学文摘库②是目前世界上应用最为重要的化学、化工及相关学科的检索工具8JST日本科学技术振兴机构数据库Japan Science&Technology Corportion日本科学技术振兴机构日本科学技术振兴机构2007①是在日本《科学技术文献速报》的基础上发展起来的网络版②隶属于日本政府文部科学省,是日本最重要的科技信息机构9AJ文摘杂志Abstract Journal全俄科学技术情报研究所全俄科学技术情报研究所1953①供查阅自然科学、技术科学和工业经济为特色②为世界五大综合性文摘杂志之一10ISR科学评论索引Index to Scientific Reviews科睿唯安Clarivate Analytics美国科学情报研究所(ISI)1974收录世界各国2700余种科技期刊及300余种专著丛刊中有价值的评述论文中文期刊序号四大索引全称所属机构成立机构成立时间核心介绍1CSCD中国科学引文数据库Chinese Science Citation Database中国科学院文献情报中心(中国科学院图书馆)中国科学院1989①是我国第一个引文数据库,被誉为"中国的SCI" ②是ISI Web of Knowledge平台上第一个非英文语种的数据库2CSSCI中国社会科学引文索引Chinese Social Sciences Citation Index南京大学中国社会科学研究评价中心南京大学&香港科技大学1997①是国家、教育部重点课题攻关项目②是我国人文社会科学评价领域的标志性工程3北大核心中文核心期刊要目总览China National Knowledge Infrastructure北京大学出版社北京大学1992由北京大学图书馆及北京十几所高校图书馆众多期刊工作者及相关单位专家参加的研究项目4中信所核心中国科技论文统计源期刊中国科技信息研究所中国科技信息研究所1980受国家科技部委托,按照美国科学情报研究所(ISI)《期刊引证报告》(UCR)的模式,结合国内情况开发
2021年04月10日
564 阅读
0 评论
0 点赞
2021-03-30
Scene Text Detection Resources(场景文字识别资源汇总) [转载] [翻译]
1. 数据集1.1 水平文字数据集ICDAR 2003(IC03):Introduction: 它总共包含509张图像,258张用于训练和251张用于测试。 具体来说,它在训练集中包含1110个文本实例,而在测试集中包含1156个文本实例。 它具有单词级注释。 IC03仅考虑英文文本实例。Link: IC03-downloadICDAR 2011(IC11):Introduction: IC11是用于文本检测的英语数据集。 它包含484张图像,229张用于训练和255张用于测试。 该数据集中有1564个文本实例。 它提供单词级和字符级注释。Link:11-downloadICDAR 2013(IC13):Introduction: IC13与IC11几乎相同。 它总共包含462张图像,用于训练的229张图像和用于测试的233张图像。 具体来说,它在训练集中包含849个文本实例,而在测试集中包含1095个文本实例。Link: IC13-download1.2 任意四边形文本数据集USTB-SV1K:Introduction:USTB-SV1K是英语数据集。 它包含来自Google街景视图的1000张街道图像,总共2955个文本实例。 它仅提供单词级注释。Link: USTB-SV1K-downloadSVT:Introduction:它包含350张图像,总共725个英文文本实例。 SVT具有字符级别和单词级别的注释。 DVT的图像是从Google街景视图中获取的,分辨率较低。Link: SVT-downloadSVT-P:Introduction: 它包含639个裁剪的单词图像以进行测试。 从Google街景视图的侧面快照中选择了图像。 因此,大多数图像会因非正面视角而严重失真。 它是SVT的改进数据集。Link: SVT-P-download (Password : vnis)ICDAR 2015(IC15):Introduction: 它总共包含1500张图像,1000张用于训练和500张用于测试。 具体来说,它包含17548个文本实例。 它提供单词级别的注释。 IC15是第一个附带场景文本数据集,并且仅考虑英语单词。Link: IC15-downloadCOCO-Text:Introduction: 它总共包含63686张图像,用于训练的43686张图像,用于验证的10000张图像和用于测试的10000张图像。 具体来说,它包含145859个裁剪的单词图像以进行测试,包括手写和打印,清晰和模糊,英语和非英语。Link: COCO-Text-downloadMSRA-TD500:Introduction: 它总共包含500张图像。 它提供文本行级别的注释而不是单词,并提供多边形框而不是轴对齐的矩形来进行文本区域注释。 它包含英文和中文文本实例。Link: MSRA-TD500-downloadMLT 2017:Introduction:它总共包含10000个自然图像。 它提供单词级别的注释。 MLT有9种语言。 它是用于场景文本检测和识别的更真实和复杂的数据集。Link: MLT-downloadMLT 2019:Introduction: 它总共包含18000张图像。 它提供单词级别的注释。 与MLT相比,此数据集有10种语言。 它是用于场景文本检测和识别的更真实和复杂的数据集。Link: MLT-2019-downloadCTW:Introduction:它包含32285个中文文本的高分辨率街景图像,总共包含1018402个字符实例。 所有图像都在字符级别进行注释,包括其基础字符类型,绑定框和其他6个属性。 这些属性指示其背景是否复杂,是否凸起,是否为手写或印刷,是否被遮挡,是否扭曲,是否使用艺术字。Link: CTW-downloadRCTW-17:Introduction:它总共包含12514张图像,用于训练的11514张图像和用于测试的1000张图像。 RCTW-17中的图像大部分是通过照相机或手机收集的,其他则是生成的图像。 文本实例用平行四边形注释。 它是第一个大规模的中文数据集,也是当时发布的最大的数据集。Link: RCTW-17-downloadReCTS:Introduction:该数据集是大规模的中国街景商标数据集。 它基于中文单词和中文文本行级标签。 标记方法是任意四边形标记。 它总共包含20000张图像。Link: ReCTS-download1.3 不规则文本数据集CUTE80:Introduction: 它包含在自然场景中拍摄的80张高分辨率图像。 具体来说,它包含288个裁剪的单词图像以进行测试。 数据集集中在弯曲的文本上。 没有提供词典。Link: CUTE80-downloadTotal-Text:Introduction: 它总共包含1,555张图像。 具体来说,它包含11459个经裁剪的单词图像,这些图像具有三种以上不同的文本方向:水平,多方向和弯曲。Link: Total-Text-downloadSCUT-CTW1500:Introduction: 它总共包含1500张图像,1000张用于训练和500张用于测试。 具体来说,它包含10751个裁剪的单词图像以进行测试。 CTW-1500中的注释是具有14个顶点的多边形。 数据集主要由中文和英文组成。Link: CTW-1500-downloadLSVT:Introduction: LSVT由20,000个测试数据,30,000个完整注释的训练数据和400,000个弱注释的训练数据组成,这些数据称为部分标签。 带标签的文本区域展示了文本的多样性:水平,多向和弯曲。Link: LSVT-downloadArTs:Introduction: ArT包含10,166张图像,5,603张用于训练和4,563张用于测试。 收集它们时会考虑到文本形状的多样性,并且所有文本形状在ArT中都有大量存在。Link: ArT-download1.4 合成数据集Synth80k :Introduction:它包含80万幅图像,其中包含约800万个合成词实例。 每个文本实例都用其文本字符串,单词级和字符级的边界框进行注释。Link: Synth80k-downloadSynthText :Introduction:它包含600万个裁剪的单词图像。 生成过程与Synth90k相似。 它也以水平样式进行注释。Link: SynthText-download1.5 数据集对比 Comparison of Datasets Datasets Language Image Text instance Text Shape Annotation level Total Train Test Total Train Test Horizontal Arbitrary-Quadrilateral Multi-oriented Char Word Text-Line IC03 English 509 258 251 2266 1110 1156 ✓ ✕ ✕ ✕ ✓ ✕ IC11 English 484 229 255 1564 ~ ~ ✓ ✕ ✕ ✓ ✓ ✕ IC13 English 462 229 233 1944 849 1095 ✓ ✕ ✕ ✓ ✓ ✕ USTB-SV1K English 1000 500 500 2955 ~ ~ ✓ ✓ ✕ ✕ ✓ ✕ SVT English 350 100 250 725 211 514 ✓ ✓ ✕ ✓ ✓ ✕ SVT-P English 238 ~ ~ 639 ~ ~ ✓ ✓ ✕ ✕ ✓ ✕ IC15 English 1500 1000 500 17548 122318 5230 ✓ ✓ ✕ ✕ ✓ ✕ COCO-Text English 63686 43686 20000 145859 118309 27550 ✓ ✓ ✕ ✕ ✓ ✕ MSRA-TD500 English/Chinese 500 300 200 ~ ~ ~ ✓ ✓ ✕ ✕ ✕ ✓ MLT 2017 Multi-lingual 18000 7200 10800 ~ ~ ~ ✓ ✓ ✕ ✕ ✓ ✕ MLT 2019 Multi-lingual 20000 10000 10000 ~ ~ ~ ✓ ✓ ✕ ✕ ✓ ✕ CTW Chinese 32285 25887 6398 1018402 812872 205530 ✓ ✓ ✕ ✓ ✓ ✕ RCTW-17 English/Chinese 12514 15114 1000 ~ ~ ~ ✓ ✓ ✕ ✕ ✕ ✓ ReCTS Chinese 20000 ~ ~ ~ ~ ~ ✓ ✓ ✕ ✓ ✓ ✕ CUTE80 English 80 ~ ~ ~ ~ ~ ✕ ✕ ✓ ✕ ✓ ✓ Total-Text English 1525 1225 300 9330 ~ ~ ✓ ✓ ✓ ✕ ✓ ✓ CTW-1500 English/Chinese 1500 1000 500 10751 ~ ~ ✓ ✓ ✓ ✕ ✓ ✓ LSVT English/Chinese 450000 430000 20000 ~ ~ ~ ✓ ✓ ✓ ✕ ✓ ✓ ArT English/Chinese 10166 5603 4563 ~ ~ ~ ✓ ✓ ✓ ✕ ✓ ✕ Synth80k English 80k ~ ~ 8m ~ ~ ✓ ✕ ✕ ✓ ✓ ✕ SynthText English 800k ~ ~ 6m ~ ~ ✓ ✓ ✕ ✕ ✓ ✕ 2. 场景文本检测资源总结2.1 方法对比场景文本检测方法可以分为四个部分:(a) 传统方法; (b) 基于分割的方法;(c) 基于回归的方法;(d) 混合方法.注意:(1)“ Hori”代表水平场景文本数据集。 (2)“ Quad”代表任意四边形文本数据集。(3)“ Irreg”代表不规则场景文本数据集。 (4)“传统方法”代表不依赖深度学习的方法。2.1.1 传统方法 Method Model Code Hori Quad Irreg Source Time Highlight Yao et al. [1] TD-Mixture ✕ ✓ ✓ ✕ CVPR 2012 1) A new dataset MSRA-TD500 and protocol for evaluation. 2) Equipped a two-level classification scheme and two sets of features extractor. Yin et al. [2] ✕ ✓ ✕ ✕ TPAMI 2013 Extract Maximally Stable Extremal Regions (MSERs) as character candidates and group them together. Le et al. [5] HOCC ✕ ✓ ✓ ✕ CVPR 2014 HOCC + MSERs Yin et al. [7] ✕ ✓ ✓ ✕ TPAMI 2015 Presenting a unified distance metric learning framework for adaptive hierarchical clustering. Wu et al. [9] ✕ ✓ ✓ ✕ TMM 2015 Exploring gradient directional symmetry at component level for smoothing edge components before text detection. Tian et al. [17] ✕ ✓ ✕ ✕ IJCAI 2016 Scene text is first detected locally in individual frames and finally linked by an optimal tracking trajectory. Yang et al. [33] ✕ ✓ ✓ ✕ TIP 2017 A text detector will locate character candidates and extract text regions. Then they will linked by an optimal tracking trajectory. Liang et al. [8] ✕ ✓ ✓ ✓ TIP 2015 Exploring maxima stable extreme regions along with stroke width transform for detecting candidate text regions. Michal et al.[12] FASText ✕ ✓ ✓ ✕ ICCV 2015 Stroke keypoints are efficiently detected and then exploited to obtain stroke segmentations. 2.1.2基于分割的方法 Method Model Code Hori Quad Irreg Source Time Highlight Li et al. [3] ✕ ✓ ✓ ✕ TIP 2014 (1)develop three novel cues that are tailored for character detection and a Bayesian method for their integration; (2)design a Markov random field model to exploit the inherent dependencies between characters. Zhang et al. [14] ✕ ✓ ✓ ✕ CVPR 2016 Utilizing FCN for salient map detection and centroid of each character prediction. Zhu et al. [16] ✕ ✓ ✓ ✕ CVPR 2016 Performs a graph-based segmentation of connected components into words (Word-Graph). He et al. [18] Text-CNN ✕ ✓ ✓ ✕ TIP 2016 Developing a new learning mechanism to train the Text-CNN with multi-level and rich supervised information. Yao et al. [21] ✕ ✓ ✓ ✕ arXiv 2016 Proposing to localize text in a holistic manner, by casting scene text detection as a semantic segmentation problem. Hu et al. [27] WordSup ✕ ✓ ✓ ✕ ICCV 2017 Proposing a weakly supervised framework that can utilize word annotations. Then the detected characters are fed to a text structure analysis module. Wu et al. [28] ✕ ✓ ✓ ✕ ICCV 2017 Introducing the border class to the text detection problem for the first time, and validate that the decoding process is largely simplified with the help of text border. Tang et al.[32] ✕ ✓ ✕ ✕ TIP 2017 A text-aware candidate text region(CTR) extraction model + CTR refinement model. Dai et al. [35] FTSN ✕ ✓ ✓ ✕ arXiv 2017 Detecting and segmenting the text instance jointly and simultaneously, leveraging merits from both semantic segmentation task and region proposal based object detection task. Wang et al. [38] ✕ ✓ ✕ ✕ ICDAR 2017 This paper proposes a novel character candidate extraction method based on super-pixel segmentation and hierarchical clustering. Deng et al. [40] PixelLink ✓ ✓ ✓ ✕ AAAI 2018 Text instances are first segmented out by linking pixels wthin the same instance together. Liu et al. [42] MCN ✕ ✓ ✓ ✕ CVPR 2018 Stochastic Flow Graph (SFG) + Markov Clustering. Lyu et al. [43] ✕ ✓ ✓ ✕ CVPR 2018 Detect scene text by localizing corner points of text bounding boxes and segmenting text regions in relative positions. Chu et al. [45] Border ✕ ✓ ✓ ✕ ECCV 2018 The paper presents a novel scene text detection technique that makes use of semantics-aware text borders and bootstrapping based text segment augmentation. Long et al. [46] TextSnake ✕ ✓ ✓ ✓ ECCV 2018 The paper proposes TextSnake, which is able to effectively represent text instances in horizontal, oriented and curved forms based on symmetry axis. Yang et al. [47] IncepText ✕ ✓ ✓ ✕ IJCAI 2018 Designing a novel Inception-Text module and introduce deformable PSROI pooling to deal with multi-oriented text detection. Yue et al. [48] ✕ ✓ ✓ ✕ BMVC 2018 Proposing a general framework for text detection called Guided CNN to achieve the two goals simultaneously. Zhong et al. [53] AF-RPN ✕ ✓ ✓ ✕ arXiv 2018 Presenting AF-RPN(anchor-free) as an anchor-free and scale-friendly region proposal network for the Faster R-CNN framework. Wang et al. [54] PSENet ✓ ✓ ✓ ✓ CVPR 2019 Proposing a novel Progressive Scale Expansion Network (PSENet), designed as a segmentation-based detector with multiple predictions for each text instance. Xu et al.[57] TextField ✕ ✓ ✓ ✓ arXiv 2018 Presenting a novel direction field which can represent scene texts of arbitrary shapes. Tian et al. [58] FTDN ✕ ✓ ✓ ✕ ICIP 2018 FTDN is able to segment text region and simultaneously regress text box at pixel-level. Tian et al. [83] ✕ ✓ ✓ ✓ CVPR 2019 Constraining embedding feature of pixels inside the same text region to share similar properties. Huang et al. [4] MSERs-CNN ✕ ✓ ✕ ✕ ECCV 2014 Combining MSERs with CNN Sun et al. [6] ✕ ✓ ✕ ✕ PR 2015 Presenting a robust text detection approach based on color-enhanced CER and neural networks. Baek et al. [62] CRAFT ✕ ✓ ✓ ✓ CVPR 2019 Proposing CRAFT effectively detect text area by exploring each character and affinity between characters. Richardson et al. [87] ✕ ✓ ✓ ✕ WACV 2019 Presenting an additional scale predictor the estimate the better scale of text regions for testing. Wang et al. [88] SAST ✕ ✓ ✓ ✓ ACMM 2019 Presenting a context attended multi-task learning framework for scene text detection. Wang et al. [90] PAN ✕ ✓ ✓ ✓ ICCV 2019 Proposing an efficient and accurate arbitrary-shaped text detector called Pixel Aggregation Network(PAN), 2.1.3 基于回归的方法 Method Model Code Hori Quad Irreg Source Time Highlight Gupta et al. [15] FCRN ✓ ✓ ✕ ✕ CVPR 2016 (a) Proposing a fast and scalable engine to generate synthetic images of text in clutter; (b) FCRN. Zhong et al. [20] DeepText ✕ ✓ ✕ ✕ arXiv 2016 (a) Inception-RPN; (b) Utilize ambiguous text category (ATC) information and multilevel region-of-interest pooling (MLRP). Liao et al. [22] TextBoxes ✓ ✓ ✕ ✕ AAAI 2017 Mainly basing SSD object detection framework. Liu et al. [25] DMPNet ✕ ✓ ✓ ✕ CVPR 2017 Quadrilateral sliding windows + shared Monte-Carlo method for fast and accurate computing of the polygonal areas + a sequential protocol for relative regression. He et al. [26] DDR ✕ ✓ ✓ ✕ ICCV 2017 Proposing an FCN that has bi-task outputs where one is pixel-wise classification between text and non-text, and the other is direct regression to determine the vertex coordinates of quadrilateral text boundaries. Jiang et al. [36] R2CNN ✕ ✓ ✓ ✕ arXiv 2017 Using the Region Proposal Network (RPN) to generate axis-aligned bounding boxes that enclose the texts with different orientations. Xing et al. [37] ArbiText ✕ ✓ ✓ ✕ arXiv 2017 Adopting the circle anchors and incorporating a pyramid pooling module into the Single Shot MultiBox Detector framework. Zhang et al. [39] FEN ✕ ✓ ✕ ✕ AAAI 2018 Proposing a refined scene text detector with a novel Feature Enhancement Network (FEN) for Region Proposal and Text Detection Refinement. Wang et al. [41] ITN ✕ ✓ ✓ ✕ CVPR 2018 ITN is presented to learn the geometry-aware representation encoding the unique geometric configurations of scene text instances with in-network transformation embedding. Liao et al. [44] RRD ✕ ✓ ✓ ✕ CVPR 2018 The regression branch extracts rotation-sensitive features, while the classification branch extracts rotation-invariant features by pooling the rotation sensitive features. Liao et al. [49] TextBoxes++ ✓ ✓ ✓ ✕ TIP 2018 Mainly basing SSD object detection framework and it replaces the rectangular box representation in conventional object detector by a quadrilateral or oriented rectangle representation. He et al. [50] ✕ ✓ ✓ ✕ TIP 2018 Proposing a scene text detection framework based on fully convolutional network with a bi-task prediction module. Ma et al. [51] RRPN ✓ ✓ ✓ ✕ TMM 2018 RRPN + RRoI Pooling. Zhu et al. [55] SLPR ✕ ✓ ✓ ✓ arXiv 2018 SLPR regresses multiple points on the edge of text line and then utilizes these points to sketch the outlines of the text. Deng et al. [56] ✓ ✓ ✓ ✕ arXiv 2018 CRPN employs corners to estimate the possible locations of text instances. And it also designs a embedded data augmentation module inside region-wise subnetwork. Cai et al. [59] FFN ✕ ✓ ✕ ✕ ICIP 2018 Proposing a Feature Fusion Network to deal with text regions differing in enormous sizes. Sabyasachi et al. [60] RGC ✕ ✓ ✓ ✕ ICIP 2018 Proposing a novel recurrent architecture to improve the learnings of a feature map at a given time. Liu et al. [63] CTD ✓ ✓ ✓ ✓ PR 2019 CTD + TLOC + PNMS Xie et al. [79] DeRPN ✓ ✓ ✕ ✕ AAAI 2019 DeRPN utilizes anchor string mechanism instead of anchor box in RPN. Wang et al. [82] ✕ ✓ ✓ ✓ CVPR 2019 Text-RPN + RNN Liu et al. [84] ✕ ✓ ✓ ✓ CVPR 2019 CSE mechanism He et al. [29] SSTD ✓ ✓ ✓ ✕ ICCV 2017 Proposing an attention mechanism. Then developing a hierarchical inception module which efficiently aggregates multi-scale inception features. Tian et al. [11] ✕ ✓ ✕ ✕ ICCV 2015 Cascade boosting detects character candidates, and the min-cost flow network model get the final result. Tian et al. [13] CTPN ✓ ✓ ✕ ✕ ECCV 2016 1) RPN + LSTM. 2) RPN incorporate a new vertical anchor mechanism and LSTM connects the region to get the final result. He et al. [19] ✕ ✓ ✓ ✕ ACCV 2016 ER detetctor detects regions to get coarse prediction of text regions. Then the local context is aggregated to classify the remaining regions to obtain a final prediction. Shi et al. [23] SegLink ✓ ✓ ✓ ✕ CVPR 2017 Decomposing text into segments and links. A link connects two adjacent segments. Tian et al. [30] WeText ✕ ✓ ✕ ✕ ICCV 2017 Proposing a weakly supervised scene text detection method (WeText). Zhu et al. [31] RTN ✕ ✓ ✕ ✕ ICDAR 2017 Mainly basing CTPN vertical vertical proposal mechanism. Ren et al. [34] ✕ ✓ ✕ ✕ TMM 2017 Proposing a CNN-based detector. It contains a text structure component detector layer, a spatial pyramid layer, and a multi-input-layer deep belief network (DBN). Zhang et al. [10] ✕ ✓ ✕ ✕ CVPR 2015 The proposed algorithm exploits the symmetry property of character groups and allows for direct extraction of text lines from natural images. Wang et al. [86] DSRN ✕ ✓ ✓ ✕ IJCAI 2019 Presenting a scale-transfer module and scale relationship module to handle the problem of scale variation. Tang et al.[89] Seglink++ ✕ ✓ ✓ ✓ PR 2019 Presenting instance aware component grouping (ICG) for arbitrary-shape text detection. Wang et al.[92] ContourNet ✓ ✓ ✓ ✓ CVPR 2020 1.A scale-insensitive Adaptive Region Proposal Network (AdaptiveRPN); 2. Local Orthogonal Texture-aware Module (LOTM). 2.1.4 混合方法 Method Model Code Hori Quad Irreg Source Time Highlight Tang et al. [52] SSFT ✕ ✓ ✕ ✕ TMM 2018 Proposing a novel scene text detection method that involves superpixel-based stroke feature transform (SSFT) and deep learning based region classification (DLRC). Xie et al.[61] SPCNet ✕ ✓ ✓ ✓ AAAI 2019 Text Context module + Re-Score mechanism. Liu et al. [64] PMTD ✓ ✓ ✓ ✕ arXiv 2019 Perform “soft” semantic segmentation. It assigns a soft pyramid label (i.e., a real value between 0 and 1) for each pixel within text instance. Liu et al. [80] BDN ✓ ✓ ✓ ✕ IJCAI 2019 Discretizing bouding boxes into key edges to address label confusion for text detection. Zhang et al. [81] LOMO ✕ ✓ ✓ ✓ CVPR 2019 DR + IRM + SEM Zhou et al. [24] EAST ✓ ✓ ✓ ✕ CVPR 2017 The pipeline directly predicts words or text lines of arbitrary orientations and quadrilateral shapes in full images with instance segmentation. Yue et al. [48] ✕ ✓ ✓ ✕ BMVC 2018 Proposing a general framework for text detection called Guided CNN to achieve the two goals simultaneously. Zhong et al. [53] AF-RPN ✕ ✓ ✓ ✕ arXiv 2018 Presenting AF-RPN(anchor-free) as an anchor-free and scale-friendly region proposal network for the Faster R-CNN framework. Xue et al.[85] MSR ✕ ✓ ✓ ✓ IJCAI 2019 Presenting a noval multi-scale regression network. Liao et al. [91] DB ✓ ✓ ✓ ✓ AAAI 2020 Presenting differentiable binarization module to adaptively set the thresholds for binarization, which simplifies the post-processing. Xiao et al. [93] SDM ✕ ✓ ✓ ✓ ECCV 2020 1. A novel sequential deformation method; 2. auxiliary character counting supervision. 2.2 检测结果2.2.1 水平文本数据集的检测结果 Method Model Source Time Method Category IC11[68] IC13 [69] IC05[67] P R F P R F P R F Yao et al. [1] TD-Mixture CVPR 2012 Traditional ~ ~ ~ 0.69 0.66 0.67 ~ ~ ~ Yin et al. [2] TPAMI 2013 0.86 0.68 0.76 ~ ~ ~ ~ ~ ~ Yin et al. [7] TPAMI 2015 0.838 0.66 0.738 ~ ~ ~ ~ ~ ~ Wu et al. [9] TMM 2015 ~ ~ ~ 0.76 0.70 0.73 ~ ~ ~ Liang et al. [8] TIP 2015 0.77 0.68 0.71 0.76 0.68 0.72 ~ ~ ~ Michal et al.[12] FASText ICCV 2015 ~ ~ ~ 0.84 0.69 0.77 ~ ~ ~ Li et al. [3] TIP 2014 Segmentation 0.80 0.62 0.70 ~ ~ ~ ~ ~ ~ Zhang et al. [14] CVPR 2016 ~ ~ ~ 0.88 0.78 0.83 ~ ~ ~ He et al. [18] Text-CNN TIP 2016 0.91 0.74 0.82 0.93 0.73 0.82 0.87 0.73 0.79 Yao et al. [21] arXiv 2016 ~ ~ ~ 0.889 0.802 0.843 ~ ~ ~ Hu et al. [27] WordSup ICCV 2017 ~ ~ ~ 0.933 0.875 0.903 ~ ~ ~ Tang et al.[32] TIP 2017 0.90 0.86 0.88 0.92 0.87 0.89 ~ ~ ~ Wang et al. [38] ICDAR 2017 0.87 0.78 0.82 0.87 0.82 0.84 ~ ~ ~ Deng et al. [40] PixelLink AAAI 2018 ~ ~ ~ 0.886 0.875 0.881 ~ ~ ~ Liu et al. [42] MCN CVPR 2018 ~ ~ ~ 0.88 0.87 0.88 ~ ~ ~ Lyu et al. [43] CVPR 2018 ~ ~ ~ 0.92 0.844 0.880 ~ ~ ~ Chu et al. [45] Border ECCV 2018 ~ ~ ~ 0.915 0.871 0.892 ~ ~ ~ Wang et al. [54] PSENet CVPR 2019 ~ ~ ~ 0.94 0.90 0.92 ~ ~ ~ Huang et al. [4] MSERs-CNN ECCV 2014 0.88 0.71 0.78 ~ ~ ~ 0.84 0.67 0.75 Sun et al. [6] PR 2015 0.92 0.91 0.91 0.94 0.92 0.93 ~ ~ ~ Gupta et al. [15] FCRN CVPR 2016 Regression 0.94 0.77 0.85 0.938 0.764 0.842 ~ ~ ~ Zhong et al. [20] DeepText arXiv 2016 0.87 0.83 0.85 0.85 0.81 0.83 ~ ~ ~ Liao et al. [22] TextBoxes AAAI 2017 0.89 0.82 0.86 0.89 0.83 0.86 ~ ~ ~ Liu et al. [25] DMPNet CVPR 2017 ~ ~ ~ 0.93 0.83 0.870 ~ ~ ~ Jiang et al. [36] R2CNN arXiv 2017 ~ ~ ~ 0.92 0.81 0.86 ~ ~ ~ Xing et al. [37] ArbiText arXiv 2017 ~ ~ ~ 0.826 0.936 0.877 ~ ~ ~ Wang et al. [41] ITN CVPR 2018 0.896 0.889 0.892 0.941 0.893 0.916 ~ ~ ~ Liao et al. [49] TextBoxes++ TIP 2018 ~ ~ ~ 0.92 0.86 0.89 ~ ~ ~ He et al. [50] TIP 2018 ~ ~ ~ 0.91 0.84 0.88 ~ ~ ~ Ma et al. [51] RRPN TMM 2018 ~ ~ ~ 0.95 0.89 0.91 ~ ~ ~ Zhu et al. [55] SLPR arXiv 2018 ~ ~ ~ 0.90 0.72 0.80 ~ ~ ~ Cai et al. [59] FFN ICIP 2018 ~ ~ ~ 0.92 0.84 0.876 ~ ~ ~ Sabyasachi et al. [60] RGC ICIP 2018 ~ ~ ~ 0.89 0.77 0.83 ~ ~ ~ Wang et al. [82] CVPR 2019 ~ ~ ~ 0.937 0.878 0.907 ~ ~ ~ Liu et al. [84] CVPR 2019 ~ ~ ~ 0.937 0.897 0.917 ~ ~ ~ He et al. [29] SSTD ICCV 2017 ~ ~ ~ 0.89 0.86 0.88 ~ ~ ~ Tian et al. [11] ICCV 2015 0.86 0.76 0.81 0.852 0.759 0.802 ~ ~ ~ Tian et al. [13] CTPN ECCV 2016 ~ ~ ~ 0.93 0.83 0.88 ~ ~ ~ He et al. [19] ACCV 2016 ~ ~ ~ 0.90 0.75 0.81 ~ ~ ~ Shi et al. [23] SegLink CVPR 2017 ~ ~ ~ 0.877 0.83 0.853 ~ ~ ~ Tian et al. [30] WeText ICCV 2017 ~ ~ ~ 0.911 0.831 0.869 ~ ~ ~ Zhu et al. [31] RTN ICDAR 2017 ~ ~ ~ 0.94 0.89 0.91 ~ ~ ~ Ren et al. [34] TMM 2017 0.78 0.67 0.72 0.81 0.67 0.73 ~ ~ ~ Zhang et al. [10] CVPR 2015 0.84 0.76 0.80 0.88 0.74 0.80 ~ ~ ~ Tang et al. [52] SSFT TMM 2018 Hybrid 0.906 0.847 0.876 0.911 0.861 0.885 ~ ~ ~ Xie et al.[61] SPCNet AAAI 2019 ~ ~ ~ 0.94 0.91 0.92 ~ ~ ~ Liu et al. [80] BDN IJCAI 2019 ~ ~ ~ 0.887 0.894 0.89 ~ ~ ~ Zhou et al. [24] EAST CVPR 2017 ~ ~ ~ 0.93 0.83 0.870 ~ ~ ~ Yue et al. [48] BMVC 2018 ~ ~ ~ 0.885 0.846 0.870 ~ ~ ~ Zhong et al. [53] AF-RPN arXiv 2018 ~ ~ ~ 0.94 0.90 0.92 ~ ~ ~ Xue et al.[85] MSR IJCAI 2019 ~ ~ ~ 0.918 0.885 0.901 ~ ~ ~ 2.2.2 任意四边形文本数据集的检测结果 Method Model Source Time Method Category IC15 [70] MSRA-TD500 [71] USTB-SV1K [65] SVT [66] P R F P R F P R F P R F Le et al. [5] HOCC CVPR 2014 Traditional ~ ~ ~ 0.71 0.62 0.66 ~ ~ ~ ~ ~ ~ Yin et al. [7] TPAMI 2015 ~ ~ ~ 0.81 0.63 0.71 0.499 0.454 0.475 ~ ~ ~ Wu et al. [9] TMM 2015 ~ ~ ~ 0.63 0.70 0.66 ~ ~ ~ ~ ~ ~ Tian et al. [17] IJCAI 2016 ~ ~ ~ 0.95 0.58 0.721 0.537 0.488 0.51 ~ ~ ~ Yang et al. [33] TIP 2017 ~ ~ ~ 0.95 0.58 0.72 0.54 0.49 0.51 ~ ~ ~ Liang et al. [8] TIP 2015 ~ ~ ~ 0.74 0.66 0.70 ~ ~ ~ ~ ~ ~ Zhang et al. [14] CVPR 2016 Segmentation 0.71 0.43 0.54 0.83 0.67 0.74 ~ ~ ~ ~ ~ ~ Zhu et al. [16] CVPR 2016 0.81 0.91 0.85 ~ ~ ~ ~ ~ ~ ~ ~ ~ He et al. [18] Text-CNN TIP 2016 ~ ~ ~ 0.76 0.61 0.69 ~ ~ ~ ~ ~ ~ Yao et al. [21] arXiv 2016 0.723 0.587 0.648 0.765 0.753 0.759 ~ ~ ~ ~ ~ ~ Hu et al. [27] WordSup ICCV 2017 0.793 0.77 0.782 ~ ~ ~ ~ ~ ~ ~ ~ ~ Wu et al. [28] ICCV 2017 0.91 0.78 0.84 0.77 0.78 0.77 ~ ~ ~ ~ ~ ~ Dai et al. [35] FTSN arXiv 2017 0.886 0.80 0.841 0.876 0.771 0.82 ~ ~ ~ ~ ~ ~ Deng et al. [40] PixelLink AAAI 2018 0.855 0.820 0.837 0.830 0.732 0.778 ~ ~ ~ ~ ~ ~ Liu et al. [42] MCN CVPR 2018 0.72 0.80 0.76 0.88 0.79 0.83 ~ ~ ~ ~ ~ ~ Lyu et al. [43] CVPR 2018 0.895 0.797 0.843 0.876 0.762 0.815 ~ ~ ~ ~ ~ ~ Chu et al. [45] Border ECCV 2018 ~ ~ ~ 0.830 0.774 0.801 ~ ~ ~ ~ ~ ~ Long et al. [46] TextSnake ECCV 2018 0.849 0.804 0.826 0.832 0.739 0.783 ~ ~ ~ ~ ~ ~ Yang et al. [47] IncepText IJCAI 2018 0.938 0.873 0.905 0.875 0.790 0.830 ~ ~ ~ ~ ~ ~ Wang et al. [54] PSENet CVPR 2019 0.8692 0.845 0.8569 ~ ~ ~ ~ ~ ~ ~ ~ ~ Xu et al.[57] TextField arXiv 2018 0.843 0.805 0.824 0.874 0.759 0.813 ~ ~ ~ ~ ~ ~ Tian et al. [58] FTDN ICIP 2018 0.847 0.773 0.809 ~ ~ ~ ~ ~ ~ ~ ~ ~ Tian et al. [83] CVPR 2019 0.883 0.850 0.866 0.842 0.817 0.829 ~ ~ ~ ~ ~ ~ Baek et al. [62] CRAFT CVPR 2019 0.898 0.843 0.869 0.882 0.782 0.829 ~ ~ ~ ~ ~ ~ Richardson et al. [87] IJCAI 2019 0.853 0.83 0.827 ~ ~ ~ ~ ~ ~ ~ ~ ~ Wang et al. [88] SAST ACMM 2019 0.8755 0.8734 0.8744 ~ ~ ~ ~ ~ ~ ~ ~ ~ Wang et al. [90] PAN ICCV 2019 0.84 0.819 0.829 0.844 0.838 0.821 ~ ~ ~ ~ ~ ~ Gupta et al. [15] FCRN CVPR 2016 Regression ~ ~ ~ ~ ~ ~ ~ ~ ~ 0.651 0.599 0.624 Liu et al. [25] DMPNet CVPR 2017 0.732 0.682 0.706 ~ ~ ~ ~ ~ ~ ~ ~ ~ He et al. [26] DDR ICCV 2017 0.82 0.80 0.81 0.77 0.70 0.74 ~ ~ ~ ~ ~ ~ Jiang et al. [36] R2CNN arXiv 2017 0.856 0.797 0.825 ~ ~ ~ ~ ~ ~ ~ ~ ~ Xing et al. [37] ArbiText arXiv 2017 0.792 0.735 0.759 0.78 0.72 0.75 ~ ~ ~ ~ ~ ~ Wang et al. [41] ITN CVPR 2018 0.857 0.741 0.795 0.903 0.723 0.803 ~ ~ ~ ~ ~ ~ Liao et al. [44] RRD CVPR 2018 0.88 0.8 0.838 0.876 0.73 0.79 ~ ~ ~ ~ ~ ~ Liao et al. [49] TextBoxes++ TIP 2018 0.878 0.785 0.829 ~ ~ ~ ~ ~ ~ ~ ~ ~ He et al. [50] TIP 2018 0.85 0.80 0.82 0.91 0.81 0.86 ~ ~ ~ ~ ~ ~ Ma et al. [51] RRPN TMM 2018 0.822 0.732 0.774 0.821 0.677 0.742 ~ ~ ~ ~ ~ ~ Zhu et al. [55] SLPR arXiv 2018 0.855 0.836 0.845 ~ ~ ~ ~ ~ ~ ~ ~ ~ Deng et al. [56] arXiv 2018 0.89 0.81 0.845 ~ ~ ~ ~ ~ ~ ~ ~ ~ Sabyasachi et al. [60] RGC ICIP 2018 0.83 0.81 0.82 0.85 0.76 0.80 ~ ~ ~ ~ ~ ~ Wang et al. [82] CVPR 2019 0.892 0.86 0.876 0.852 0.821 0.836 ~ ~ ~ ~ ~ ~ He et al. [29] SSTD ICCV 2017 0.80 0.73 0.77 ~ ~ ~ ~ ~ ~ ~ ~ ~ Tian et al. [13] CTPN ECCV 2016 0.74 0.52 0.61 ~ ~ ~ ~ ~ ~ ~ ~ ~ He et al. [19] ACCV 2016 ~ ~ ~ ~ ~ ~ ~ ~ ~ 0.87 0.73 0.79 Shi et al. [23] SegLink CVPR 2017 0.731 0.768 0.75 0.86 0.70 0.77 ~ ~ ~ ~ ~ ~ Wang et al. [86] DSRN IJCAI 2019 0.832 0.796 0.814 0.876 0.712 0.785 ~ ~ ~ ~ ~ ~ Tang et al.[89] Seglink++ PR 2019 0.837 0.803 0.820 ~ ~ ~ ~ ~ ~ ~ ~ ~ Wang et al. [92] ContourNet CVPR 2020 0.876 0.861 0.869 ~ ~ ~ ~ ~ ~ ~ ~ ~ Tang et al. [52] SSFT TMM 2018 Hybrid ~ ~ ~ ~ ~ ~ ~ ~ ~ 0.541 0.758 0.631 Xie et al.[61] SPCNet AAAI 2019 0.89 0.86 0.87 ~ ~ ~ ~ ~ ~ ~ ~ ~ Liu et al. [64] PMTD arXiv 2019 0.913 0.874 0.893 ~ ~ ~ ~ ~ ~ ~ ~ ~ Liu et al. [80] BDN IJCAI 2019 0.881 0.846 0.863 0.87 0.815 0.842 ~ ~ ~ ~ ~ ~ Zhang et al. [81] LOMO CVPR 2019 0.878 0.876 0.877 ~ ~ ~ ~ ~ ~ ~ ~ ~ Zhou et al. [24] EAST CVPR 2017 0.833 0.783 0.807 0.873 0.674 0.761 ~ ~ ~ ~ ~ ~ Yue et al. [48] BMVC 2018 0.866 0.789 0.823 ~ ~ ~ ~ ~ ~ 0.691 0.660 0.675 Zhong et al. [53] AF-RPN arXiv 2018 0.89 0.83 0.86 ~ ~ ~ ~ ~ ~ ~ ~ ~ Xue et al.[85] MSR IJCAI 2019 ~ ~ ~ 0.874 0.767 0.817 ~ ~ ~ ~ ~ ~ Liao et al. [91] DB AAAI 2020 0.918 0.832 0.873 0.915 0.792 0.849 ~ ~ ~ ~ ~ ~ Xiao et al. [93] SDM ECCV 2020 0.9196 0.8922 0.9057 ~ ~ ~ ~ ~ ~ ~ ~ ~ Method Model Source Time Method Category IC15 [70] MSRA-TD500 [71] USTB-SV1K [65] SVT [66] P R F P R F P R F P R F Le et al. [5] HOCC CVPR 2014 Traditional ~ ~ ~ ~ ~ ~ ~ ~ ~ 0.80 0.73 0.76 Yao et al. [21] arXiv 2016 Segmentation 0.432 0.27 0.333 ~ ~ ~ ~ ~ ~ ~ ~ ~ Hu et al. [27] WordSup ICCV 2017 0.452 0.309 0.368 ~ ~ ~ ~ ~ ~ ~ ~ ~ Lyu et al. [43] CVPR 2018 0.351 0.348 0.349 ~ ~ ~ 0.743 0.706 0.724 ~ ~ ~ Chu et al. [45] Border ECCV 2018 ~ ~ ~ 0.782 0.588 0.671 0.777 0.621 0.690 ~ ~ ~ Yang et al. [47] IncepText IJCAI 2018 ~ ~ ~ 0.785 0.569 0.660 ~ ~ ~ ~ ~ ~ Wang et al. [54] PSENet CVPR 2019 ~ ~ ~ ~ ~ ~ 0.7535 0.6918 0.7213 ~ ~ ~ Baek et al. [62] CRAFT CVPR 2019 ~ ~ ~ ~ ~ ~ 0.806 0.682 0.739 ~ ~ ~ He et al. [29] SSTD ICCV 2017 Regression 0.46 0.31 0.37 ~ ~ ~ ~ ~ ~ ~ ~ ~ Gupta et al. [15] FCRN CVPR 2016 ~ ~ ~ ~ ~ ~ 0.844 0.763 0.801 ~ ~ ~ Liao et al. [49] TextBoxes++ TIP 2018 0.61 0.57 0.59 ~ ~ ~ ~ ~ ~ ~ ~ ~ Ma et al. [51] RRPN TMM 2018 ~ ~ ~ ~ ~ ~ 0.7669 0.5794 0.6601 ~ ~ ~ Deng et al. [56] arXiv 2018 0.555 0.633 0.591 ~ ~ ~ ~ ~ ~ ~ ~ ~ Cai et al. [59] FFN ICIP 2018 0.43 0.35 0.39 ~ ~ ~ ~ ~ ~ ~ ~ ~ Xie et al. [79] DeRPN AAAI 2019 0.586 0.557 0.571 ~ ~ ~ ~ ~ ~ ~ ~ ~ He et al. [29] SSTD ICCV 2017 0.46 0.31 0.37 ~ ~ ~ ~ ~ ~ ~ ~ ~ Liao et al. [44] RRD CVPR 2018 ~ ~ ~ 0.591 0.775 0.670 ~ ~ ~ ~ ~ ~ Richardson et al. [87] IJCAI 2019 ~ ~ ~ ~ ~ ~ 0.729 0.618 0.669 ~ ~ ~ Wang et al. [88] SAST ACMM 2019 ~ ~ ~ ~ ~ ~ 0.7935 0.6653 0.7237 ~ ~ ~ Xie et al.[61] SPCNet AAAI 2019 Hybrid ~ ~ ~ ~ ~ ~ 0.806 0.686 0.741 ~ ~ ~ Liu et al. [64] PMTD arXiv 2019 ~ ~ ~ ~ ~ ~ 0.844 0.763 0.801 ~ ~ ~ Liu et al. [80] BDN IJCAI 2019 ~ ~ ~ ~ ~ ~ 0.791 0.698 0.742 ~ ~ ~ Zhang et al. [81] LOMO CVPR 2019 ~ ~ ~ 0.791 0.602 0.684 0.802 0.672 0.731 ~ ~ ~ Zhou et al. [24] EAST CVPR 2017 0.504 0.324 0.395 ~ ~ ~ ~ ~ ~ ~ ~ ~ Zhong et al. [53] AF-RPN arXiv 2018 ~ ~ ~ ~ ~ ~ 0.75 0.66 0.70 ~ ~ ~ Liao et al. [91] DB AAAI 2020 ~ ~ ~ ~ ~ ~ 0.831 0.679 0.747 ~ ~ ~ Xiao et al. [93] SDM ECCV 2020 ~ ~ ~ ~ ~ ~ 0.8679 0.7526 0.8061 ~ ~ ~ 2.2.3 不规则文本数据集的检测结果在本节中,我们仅选择适用于不规则文本检测的那些方法。 Method Model Source Time Method Category Total-text [74] SCUT-CTW1500 [75] P R F P R F Baek et al. [62] CRAFT CVPR 2019 Segmentation 0.876 0.799 0.836 0.860 0.811 0.835 Long et al. [46] TextSnake ECCV 2018 0.827 0.745 0.784 0.679 0.853 0.756 Tian et al. [83] CVPR 2019 ~ ~ ~ 81.7 84.2 80.1 Wang et al. [54] PSENet CVPR 2019 0.840 0.779 0.809 0.848 0.797 0.822 Wang et al. [88] SAST ACMM 2019 0.8557 0.7549 0.802 0.8119 0.8171 0.8145 Wang et al. [90] PAN ICCV 2019 0.893 0.81 0.85 0.864 0.812 0.837 Zhu et al. [55] SLPR arXiv 2018 Regression ~ ~ ~ 0.801 0.701 0.748 Liu et al. [63] CTD+TLOC PR 2019 ~ ~ ~ 0.774 0.698 0.734 Wang et al. [82] CVPR 2019 ~ ~ ~ 80.1 80.2 80.1 Liu et al. [84] CVPR 2019 0.814 0.791 0.802 0.787 0.761 0.774 Tang et al.[89] Seglink++ PR 2019 0.829 0.809 0.815 0.828 0.798 0.813 Wang et al. [92] ContourNet CVPR 2020 0.869 0.839 0.854 0.837 0.841 0.839 Zhang et al. [81] LOMO CVPR 2019 Hybrid 0.876 0.793 0.833 0.857 0.765 0.808 Xie et al.[61] SPCNet AAAI 2019 0.83 0.83 0.83 ~ ~ ~ Xue et al.[85] MSR IJCAI 2019 0.852 0.73 0.768 0.838 0.778 0.807 Liao et al. [91] DB AAAI 2020 0.871 0.825 0.847 0.869 0.802 0.834 Xiao et al.[93] SDM ECCV 2020 0.9085 0.8603 0.8837 0.884 0.8442 0.8636 3. 综述[A] [TPAMI-2015] Ye Q, Doermann D. Text detection and recognition in imagery: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(7): 1480-1500. paper[B] [Frontiers-Comput. Sci-2016] Zhu Y, Yao C, Bai X. Scene text detection and recognition: Recent advances and future trends[J]. Frontiers of Computer Science, 2016, 10(1): 19-36. paper[C] [arXiv-2018] Long S, He X, Ya C. Scene Text Detection and Recognition: The Deep Learning Era[J]. arXiv preprint arXiv:1811.04256, 2018. paper4. Evaluation如果您有兴趣开发更好的场景文本检测指标,那么这里推荐的一些参考可能会有用:[A] Wolf, Christian, and Jean-Michel Jolion. "Object count/area graphs for the evaluation of object detection and segmentation algorithms." International Journal of Document Analysis and Recognition (IJDAR) 8.4 (2006): 280-296. paper[B] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. K. Ghosh, A. D.Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny. ICDAR 2015 competition on robust reading. In ICDAR, pages 1156–1160, 2015. paper[C] Calarasanu, Stefania, Jonathan Fabrizio, and Severine Dubuisson. "What is a good evaluation protocol for text localization systems? Concerns, arguments, comparisons and solutions." Image and Vision Computing 46 (2016): 1-17. paper[D] Shi, Baoguang, et al. "ICDAR2017 competition on reading chinese text in the wild (RCTW-17)." 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Vol. 1. IEEE, 2017. paper[E] Nayef, N; Yin, F; Bizid, I; et al. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on, volume 1, 1454–1459. IEEE.paper[F] Dangla, Aliona, et al. "A first step toward a fair comparison of evaluation protocols for text detection algorithms." 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE, 2018. paper[G] He,Mengchao and Liu, Yuliang, et al. ICPR2018 Contest on Robust Reading for Multi-Type Web images. ICPR 2018. paper[H] Liu, Yuliang and Jin, Lianwen, et al. "Tightness-aware Evaluation Protocol for Scene Text Detection" Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2019. paper code5. OCR ServiceOCRAPIFreeTesseract OCR Engine×√Azure√√ABBYY√√OCR Space√√SODA PDF OCR√√Free Online OCR√√Online OCR√√Super Tools√√Online Chinese Recognition√√Calamari OCR×√Tencent OCR√×6. References and Code [1] Yao C, Bai X, Liu W, et al. Detecting texts of arbitrary orientations in natural images. 2012 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2012: 1083-1090. Paper[2] Yin X C, Yin X, Huang K, et al. Robust text detection in natural scene images. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2013, 36(5): 970-83. Paper[3] Li Y, Jia W, Shen C, et al. Characterness: An indicator of text in the wild. IEEE transactions on image processing, 2014, 23(4): 1666-1677. Paper[4] Huang W, Qiao Y, Tang X. Robust scene text detection with convolution neural network induced mser trees. European Conference on Computer Vision(ECCV), 2014: 497-511. Paper[5] Kang L, Li Y, Doermann D. Orientation robust text line detection in natural images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 4034-4041. Paper[6] Sun L, Huo Q, Jia W, et al. A robust approach for text detection from natural scene images. Pattern Recognition, 2015, 48(9): 2906-2920. Paper[7] Yin X C, Pei W Y, Zhang J, et al. Multi-orientation scene text detection with adaptive clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2015 (9): 1930-1937. Paper[8] Liang G, Shivakumara P, Lu T, et al. Multi-spectral fusion based approach for arbitrarily oriented scene text detection in video images. IEEE Transactions on Image Processing, 2015, 24(11): 4488-4501. Paper[9] Wu L, Shivakumara P, Lu T, et al. A New Technique for Multi-Oriented Scene Text Line Detection and Tracking in Video. IEEE Trans. Multimedia, 2015, 17(8): 1137-1152. Paper[10] Zheng Z, Wei S, et al. Symmetry-based text line detection in natural scenes. IEEE Conference on Computer Vision & Pattern Recognition(CVPR), 2015. Paper[11] Tian S, Pan Y, Huang C, et al. Text flow: A unified text detection system in natural scene images. Proceedings of the IEEE international conference on computer vision(ICCV). 2015: 4651-4659. Paper[12] Buta M, et al. FASText: Efficient unconstrained scene text detector. 2015 IEEE International Conference on Computer Vision (ICCV). 2015: 1206-1214. Paper[13] Tian Z, Huang W, He T, et al. Detecting text in natural image with connectionist text proposal network. European conference on computer vision(ECCV), 2016: 56-72. Paper Code[14] Zhang Z, Zhang C, Shen W, et al. Multi-oriented text detection with fully convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2016: 4159-4167. Paper[15] Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2016: 2315-2324. Paper Code[16] S. Zhu and R. Zanibbi, A Text Detection System for Natural Scenes with Convolutional Feature Learning and Cascaded Classification, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 625-632. Paper[17] Tian S, Pei W Y, Zuo Z Y, et al. Scene Text Detection in Video by Learning Locally and Globally. IJCAI. 2016: 2647-2653. Paper[18] He T, Huang W, Qiao Y, et al. Text-attentional convolutional neural network for scene text detection. IEEE transactions on image processing, 2016, 25(6): 2529-2541. Paper[19] He, Dafang and Yang, Xiao and Huang, Wenyi and Zhou, Zihan and Kifer, Daniel and Giles, C Lee. Aggregating local context for accurate scene text detection. ACCV, 2016. Paper[20] Zhong Z, Jin L, Zhang S, et al. Deeptext: A unified framework for text proposal generation and text detection in natural images. arXiv preprint arXiv:1605.07314, 2016. Paper[21] Yao C, Bai X, Sang N, et al. Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002, 2016. Paper[22] Liao M, Shi B, Bai X, et al. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. AAAI. 2017: 4161-4167. Paper Code[23] Shi B, Bai X, Belongie S. Detecting Oriented Text in Natural Images by Linking Segments. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 3482-3490. Paper Code[24] Zhou X, Yao C, Wen H, et al. EAST: an efficient and accurate scene text detector. CVPR, 2017: 2642-2651. Paper Code[25] Liu Y, Jin L. Deep matching prior network: Toward tighter multi-oriented text detection. CVPR, 2017: 3454-3461. Paper[26] He W, Zhang X Y, Yin F, et al. Deep Direct Regression for Multi-Oriented Scene Text Detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2017: 745-753. Paper[27] Hu H, Zhang C, Luo Y, et al. Wordsup: Exploiting word annotations for character based text detection. ICCV, 2017. Paper[28] Wu Y, Natarajan P. Self-organized text detection with minimal post-processing via border learning. ICCV, 2017. Paper[29] He P, Huang W, He T, et al. Single shot text detector with regional attention. The IEEE International Conference on Computer Vision (ICCV). 2017, 6(7). Paper Code[30] Tian S, Lu S, Li C. Wetext: Scene text detection under weak supervision. ICCV, 2017. Paper[31] Zhu, Xiangyu and Jiang, Yingying et al. Deep Residual Text Detection Network for Scene Text. ICDAR, 2017. Paper[32] Tang Y , Wu X. Scene Text Detection and Segmentation Based on Cascaded Convolution Neural Networks. IEEE Transactions on Image Processing, 2017, 26(3):1509-1520. Paper[33] Yang C, Yin X C, Pei W Y, et al. Tracking Based Multi-Orientation Scene Text Detection: A Unified Framework with Dynamic Programming. IEEE Transactions on Image Processing, 2017. Paper[34] X. Ren, Y. Zhou, J. He, K. Chen, X. Yang and J. Sun, A Convolutional Neural Network-Based Chinese Text Detection Algorithm via Text Structure Modeling. in IEEE Transactions on Multimedia, vol. 19, no. 3, pp. 506-518, March 2017. Paper[35] Dai Y, Huang Z, Gao Y, et al. Fused text segmentation networks for multi-oriented scene text detection. arXiv preprint arXiv:1709.03272, 2017. Paper[36] Jiang Y, Zhu X, Wang X, et al. R2CNN: rotational region CNN for orientation robust scene text detection. arXiv preprint arXiv:1706.09579, 2017. Paper[37] Xing D, Li Z, Chen X, et al. ArbiText: Arbitrary-Oriented Text Detection in Unconstrained Scene. arXiv preprint arXiv:1711.11249, 2017. Paper[38] C. Wang, F. Yin and C. Liu, Scene Text Detection with Novel Superpixel Based Character Candidate Extraction. in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017, pp. 929-934. Paper[39] Sheng Zhang, Yuliang Liu, Lianwen Jin et al. Feature Enhancement Network: A Refined Scene Text Detector. In AAAI 2018. Paper[40] Dan Deng et al. PixelLink: Detecting Scene Text via Instance Segmentation. In AAAI 2018. Paper Code[41] Fangfang Wang, Liming Zhao, Xi L et al. Geometry-Aware Scene Text Detection with Instance Transformation Network. In CVPR 2018. Paper[42] Zichuan Liu, Guosheng Lin, Sheng Yang et al. Learning Markov Clustering Networks for Scene Text Detection. In CVPR 2018. Paper[43] Pengyuan Lyu, Cong Yao, Wenhao Wu et al. Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation. In CVPR 2018. Paper[44] Minghui L, Zhen Z, Baoguang S. Rotation-Sensitive Regression for Oriented Scene Text Detection. In CVPR 2018. Paper[45] Chuhui Xue et al. Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping. In ECCV 2018. Paper[46] Long, Shangbang and Ruan, Jiaqiang, et al. TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes. In ECCV, 2018. Paper[47] Qiangpeng Yang, Mengli Cheng et al. IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection. In IJCAI 2018. Paper[48] Xiaoyu Yue et al. Boosting up Scene Text Detectors with Guided CNN. In BMVC 2018. Paper[49] Liao M, Shi B , Bai X. TextBoxes++: A Single-Shot Oriented Scene Text Detector. IEEE Transactions on Image Processing, 2018, 27(8):3676-3690. Paper Code[50] W. He, X. Zhang, F. Yin and C. Liu, Multi-Oriented and Multi-Lingual Scene Text Detection With Direct Regression, in IEEE Transactions on Image Processing, vol. 27, no. 11, pp.5406-5419, 2018. Paper[51] Ma J, Shao W, Ye H, et al. Arbitrary-oriented scene text detection via rotation proposals.in IEEE Transactions on Multimedia, 2018. Paper Code[52] Youbao Tang and Xiangqian Wu. Scene Text Detection Using Superpixel-Based Stroke Feature Transform and Deep Learning Based Region Classification. In TMM, 2018. Paper[53] Zhuoyao Zhong, Lei Sun and Qiang Huo. An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches. arXiv preprint arXiv:1804.09003. 2018. Paper[54] Wenhai W, Enze X, et al. Shape Robust Text Detection with Progressive Scale Expansion Network. In CVPR 2019. Paper Code[55] Zhu Y, Du J. Sliding Line Point Regression for Shape Robust Scene Text Detection. arXiv preprint arXiv:1801.09969, 2018. Paper[56] Linjie D, Yanxiang Gong, et al. Detecting Multi-Oriented Text with Corner-based Region Proposals. arXiv preprint arXiv: 1804.02690, 2018. Paper Code[57] Yongchao Xu, Yukang Wang, Wei Zhou, et al. TextField: Learning A Deep Direction Field for Irregular Scene Text Detection. arXiv preprint arXiv: 1812.01393, 2018. Paper[58] Xiaowei Tian, Dao Wu, Rui Wang, Xiaochun Cao. Focal Text: an Accurate Text Detection with Focal Loss. In ICIP 2018. Paper[59] Chenqin C, Pin L, Bing S. Feature Fusion Network for Scene Text Detection. In ICIP, 2018. Paper[60] Sabyasachi Mohanty et al. Recurrent Global Convolutional Network for Scene Text Detection. In ICIP 2018. Paper[61] Enze Xie, et al. Scene Text Detection with Supervised Pyramid Context Network. In AAAI 2019. Paper[62] Youngmin Baek, Bado Lee, et al. Character Region Awareness for Text Detection. In CVPR 2019. Paper[63] Yuliang L, Lianwen J, Shuaitao Z, et al. Curved Scene Text Detection via Transverse and Longitudinal Sequence Connection. Pattern Recognition, 2019. Paper Code[64] Jingchao Liu, Xuebo Liu, et al, Pyramid Mask Text Detector. arXiv preprint arXiv:1903.11800, 2019. Paper Code[79] Lele Xie, Yuliang Liu, Lianwen Jin, Zecheng Xie, DeRPN: Taking a further step toward more general object detection. In AAAI, 2019. Paper Code[80] Yuliang Liu, Lianwen Jin, et al, Omnidirectional Scene Text Detction with Sequential-free Box Discretization. In IJCAI, 2019.Paper Code[81] Chengquan Zhang, Borong Liang, et al, Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes. In CVPR, 2019.Paper[82] Xiaobing Wang, Yingying Jiang, et al, Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation. In CVPR, 2019. Paper[83] Zhuotao Tian, Michelle Shu, et al, Learning Shape-Aware Embedding for Scene Text Detection. In CVPR, 2019. Paper[84] Zichuan Liu, Guosheng Lin, et al, Towards Robust Curve Text Detection with Conditional Spatial Expansion. In CVPR, 2019. Paper[85] Xue C, Lu S, Zhang W. MSR: multi-scale shape regression for scene text detection. In IJCAI, 2019. Paper[86] Wang Y, Xie H, Fu Z, et al. DSRN: a deep scale relationship network for scene text detection. In IJCAI, 2019: 947-953. Paper[87] Elad Richardson, et al, It's All About The Scale -- Efficient Text Detection Using Adaptive Scaling. In WACV, 2020. Paper[88] Pengfei Wang, et al, A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning. In ACMM, 2019. Paper[89] Jun Tang, et al, SegLink ++: Detecting Dense and Arbitrary-shaped Scene Text by Instance-aware Component Grouping. In PR, 2019. Paper[90] Wenhai Wang, et al, Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network. In ICCV, 2019. Paper[91] Minghui Liao, et al, Real-time Scene Text Detection with Differentiable Binarization. In AAAI, 2020. PaperCode[92] Wang, Yuxin, et al. ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene Text Detection. CVPR. 2020. PaperCode[93] Xiao, et al, Sequential Deformation for Accurate Scene Text Detection. In ECCV, 2020. Paper DatasetsUSTB-SV1K[65]:Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, and Hong-Wei Hao, Robust text detection in natural scene images, IEEE Trans. Pattern Analysis and Machine Intelligence (TPAMI), priprint, 2013. PaperSVT[66]: Wang,Kai, and S. Belongie. Word Spotting in the Wild. European Conference on Computer Vision(ECCV), 2010: 591-604. PaperICDAR2005[67]: Lucas, S: ICDAR 2005 text locating competition results. In: ICDAR ,2005. PaperICDAR2011[68]: Shahab, A, Shafait, F, Dengel, A: ICDAR 2011 robust reading competition challenge 2: Reading text in scene images. In: ICDAR, 2011. PaperICDAR2013[69]:D. Karatzas, F. Shafait, S. Uchida, et al. ICDAR 2013 robust reading competition. In ICDAR, 2013. PaperICDAR2015[70]:D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. K. Ghosh, A. D.Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny. ICDAR 2015 competition on robust reading. In ICDAR, pages 1156–1160, 2015. PaperMSRA-TD500[71]:C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, Detecting texts of arbitrary orientations in natural images. in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012, pp.1083–1090.PaperCOCO-Text[72]:Veit A, Matera T, Neumann L, et al. Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140, 2016. PaperRCTW-17[73]:Shi B, Yao C, Liao M, et al. ICDAR2017 competition on reading chinese text in the wild (RCTW-17). Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on. IEEE, 2017, 1: 1429-1434. PaperTotal-Text[74]:Chee C K, Chan C S. Total-text: A comprehensive dataset for scene text detection and recognition.Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on. IEEE, 2017, 1: 935-942.PaperSCUT-CTW1500[75]:Yuliang L, Lianwen J, Shuaitao Z, et al. Curved Scene Text Detection via Transverse and Longitudinal Sequence Connection. Pattern Recognition, 2019.PaperMLT 2017[76]: Nayef, N; Yin, F; Bizid, I; et al. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on, volume 1, 1454–1459. IEEE. PaperOSTD[77]: Chucai Yi and YingLi Tian, Text string detection from natural scenes by structure-based partition and grouping, In IEEE Transactions on Image Processing, vol. 20, no. 9, pp. 2594–2605, 2011. PaperCTW[78]: Yuan T L, Zhu Z, Xu K, et al. Chinese Text in the Wild. arXiv preprint arXiv:1803.00085, 2018. Paper如果您发现我们的资源中有任何问题,或者我们错过了任何好的论文/代码,请通过liuchongyu1996@gmail.com通知我们。 感谢您的贡献。CopyrightCopyright © 2019 SCUT-DLVC. All Rights Reserved.
2021年03月30日
702 阅读
0 评论
0 点赞
2021-03-30
Scene Text Detection Resources(场景文字识别资源汇总)[转载]
1. Datasets1.1 Horizontal-Text DatasetsICDAR 2003(IC03):Introduction: It contains 509 images in total, 258 for training and 251 for testing. Specifically, it contains 1110 text instance in training set, while 1156 in testing set. It has word-level annotation. IC03 only consider English text instance.Link: IC03-downloadICDAR 2011(IC11):Introduction: IC11 is an English dataset for text detection. It contains 484 images, 229 for training and 255 for testing. There are 1564 text instance in this dataset. It provides both word-level and character-level annotation.Link: IC11-downloadICDAR 2013(IC13):Introduction: IC13 is almost the same as IC11. It contains 462 images in total, 229 for training and 233 for testing. Specifically, it contains 849 text instance in training set, while 1095 in testing set.Link: IC13-download1.2 Arbitrary-Quadrilateral-Text DatasetsUSTB-SV1K:Introduction: USTB-SV1K is an English dataset. It contains 1000 street images from Google Street View with 2955 text instance in total. It only provides word-level annotations.Link: USTB-SV1K-downloadSVT:Introduction: It contains 350 images with 725 English text intance in total. SVT has both character-level and word-level annotations. The images of SVT are harvested from Google Street View and have low resolution.Link: SVT-downloadSVT-P:Introduction: It contains 639 cropped word images for testing. Images were selected from the side-view angle snapshots in Google Street View. Therefore, most images are heavily distorted by the non-frontal view angle. It is the imporved datasets of SVT.Link: SVT-P-download (Password : vnis)ICDAR 2015(IC15):Introduction: It contains 1500 images in total, 1000 for training and 500 for testing. Specifically, it contains 17548 text instance. It provides word-level annotations. IC15 is the first incidental scene text dataset and it only considers English words.Link: IC15-downloadCOCO-Text:Introduction: It contains 63686 images in total, 43686 for training, 10000 for validating and 10000 for testing. Specifically, it contains 145859 cropped word images for testing, including handwritten and printed, clear and blur, English and non-English.Link: COCO-Text-downloadMSRA-TD500:Introduction: It contains 500 images in total. It provides text-line-level annotation rather than word, and polygon boxes rather than axis-aligned rectangles for text region annootation. It contains both English and Chinese text instance.Link: MSRA-TD500-downloadMLT 2017:Introduction: It contains 10000 natural images in total. It provides word-level annotation. There are 9 languages for MLT. It is a more real and complex datasets for scene text detection and recognition..Link: MLT-downloadMLT 2019:Introduction: It contains 18000 images in total. It provides word-level annotation. Compared to MLT, this dataset has 10 languages. It is a more real and complex datasets for scene text detection and recognition..Link: MLT-2019-downloadCTW:Introduction: It contains 32285 high resolution street view images of Chinese text, with 1018402 character instances in total. All images are annotated at the character level, including its underlying character type, bouding box, and 6 other attributes. These attributes indicate whether its background is complex, whether it’s raised, whether it’s hand-written or printed, whether it’s occluded, whether it’s distorted, whether it uses word-art.Link: CTW-downloadRCTW-17:Introduction: It contains 12514 images in total, 11514 for training and 1000 for testing. Images in RCTW-17 were mostly collected by camera or mobile phone, and others were generated images. Text instances are annotated with parallelograms. It is the first large scale Chinese dataset, and was also the largest published one by then.Link: RCTW-17-downloadReCTS:Introduction: This data set is a large-scale Chinese Street View Trademark Data Set. It is based on Chinese words and Chinese text line-level labeling. The labeling method is arbitrary quadrilateral labeling. It contains 20000 images in total.Link: ReCTS-download1.3 Irregular-Text DatasetsCUTE80:Introduction: It contains 80 high-resolution images taken in natural scenes. Specifically, it contains 288 cropped word images for testing. The dataset focuses on curved text. No lexicon is provided.Link: CUTE80-downloadTotal-Text:Introduction: It contains 1,555 images in total. Specifically, it contains 11,459 cropped word images with more than three different text orientations: horizontal, multi-oriented and curved.Link: Total-Text-downloadSCUT-CTW1500:Introduction: It contains 1500 images in total, 1000 for training and 500 for testing. Specifically, it contains 10751 cropped word images for testing. Annotations in CTW-1500 are polygons with 14 vertexes. The dataset mainly consists of Chinese and English.Link: CTW-1500-downloadLSVT:Introduction: LSVT consists of 20,000 testing data, 30,000 training data in full annotations and 400,000 training data in weak annotations, which are referred to as partial labels. The labeled text regions demonstrate the diversity of text: horizontal, multi-oriented and curved.Link: LSVT-downloadArTs:Introduction: ArT consists of 10,166 images, 5,603 for training and 4,563 for testing. They were collected with text shape diversity in mind and all text shapes have high number of existence in ArT.Link: ArT-download1.4 Synthetic DatasetsSynth80k :Introduction: It contains 800 thousands images with approximately 8 million synthetic word instances. Each text instance is annotated with its text-string, word-level and character-level bounding-boxes.Link: Synth80k-downloadSynthText :Introduction: It contains 6 million cropped word images. The generation process is similar to that of Synth90k. It is also annotated in horizontal-style.Link: SynthText-download1.5 Comparison of Datasets Comparison of Datasets Datasets Language Image Text instance Text Shape Annotation level Total Train Test Total Train Test Horizontal Arbitrary-Quadrilateral Multi-oriented Char Word Text-Line IC03 English 509 258 251 2266 1110 1156 ✓ ✕ ✕ ✕ ✓ ✕ IC11 English 484 229 255 1564 ~ ~ ✓ ✕ ✕ ✓ ✓ ✕ IC13 English 462 229 233 1944 849 1095 ✓ ✕ ✕ ✓ ✓ ✕ USTB-SV1K English 1000 500 500 2955 ~ ~ ✓ ✓ ✕ ✕ ✓ ✕ SVT English 350 100 250 725 211 514 ✓ ✓ ✕ ✓ ✓ ✕ SVT-P English 238 ~ ~ 639 ~ ~ ✓ ✓ ✕ ✕ ✓ ✕ IC15 English 1500 1000 500 17548 122318 5230 ✓ ✓ ✕ ✕ ✓ ✕ COCO-Text English 63686 43686 20000 145859 118309 27550 ✓ ✓ ✕ ✕ ✓ ✕ MSRA-TD500 English/Chinese 500 300 200 ~ ~ ~ ✓ ✓ ✕ ✕ ✕ ✓ MLT 2017 Multi-lingual 18000 7200 10800 ~ ~ ~ ✓ ✓ ✕ ✕ ✓ ✕ MLT 2019 Multi-lingual 20000 10000 10000 ~ ~ ~ ✓ ✓ ✕ ✕ ✓ ✕ CTW Chinese 32285 25887 6398 1018402 812872 205530 ✓ ✓ ✕ ✓ ✓ ✕ RCTW-17 English/Chinese 12514 15114 1000 ~ ~ ~ ✓ ✓ ✕ ✕ ✕ ✓ ReCTS Chinese 20000 ~ ~ ~ ~ ~ ✓ ✓ ✕ ✓ ✓ ✕ CUTE80 English 80 ~ ~ ~ ~ ~ ✕ ✕ ✓ ✕ ✓ ✓ Total-Text English 1525 1225 300 9330 ~ ~ ✓ ✓ ✓ ✕ ✓ ✓ CTW-1500 English/Chinese 1500 1000 500 10751 ~ ~ ✓ ✓ ✓ ✕ ✓ ✓ LSVT English/Chinese 450000 430000 20000 ~ ~ ~ ✓ ✓ ✓ ✕ ✓ ✓ ArT English/Chinese 10166 5603 4563 ~ ~ ~ ✓ ✓ ✓ ✕ ✓ ✕ Synth80k English 80k ~ ~ 8m ~ ~ ✓ ✕ ✕ ✓ ✓ ✕ SynthText English 800k ~ ~ 6m ~ ~ ✓ ✓ ✕ ✕ ✓ ✕ 2. Summary of Scene Text Detection Resources2.1 Comparison of MethodsScene text detection methods can be devided into four parts:(a) Traditional methods;(b) Segmentation-based methods;(c) Regression-based methods;(d) Hybrid methods.It is important to notice that: (1) "Hori" stands for horizontal scene text datasets. (2) "Quad" stands for arbitrary-quadrilateral-text datasets. (3) "Irreg" stands for irregular scence text datasets. (4) "Traditional method" stands for the methods that don't rely on deep learning.2.1.1 Traditional Methods Method Model Code Hori Quad Irreg Source Time Highlight Yao et al. [1] TD-Mixture ✕ ✓ ✓ ✕ CVPR 2012 1) A new dataset MSRA-TD500 and protocol for evaluation. 2) Equipped a two-level classification scheme and two sets of features extractor. Yin et al. [2] ✕ ✓ ✕ ✕ TPAMI 2013 Extract Maximally Stable Extremal Regions (MSERs) as character candidates and group them together. Le et al. [5] HOCC ✕ ✓ ✓ ✕ CVPR 2014 HOCC + MSERs Yin et al. [7] ✕ ✓ ✓ ✕ TPAMI 2015 Presenting a unified distance metric learning framework for adaptive hierarchical clustering. Wu et al. [9] ✕ ✓ ✓ ✕ TMM 2015 Exploring gradient directional symmetry at component level for smoothing edge components before text detection. Tian et al. [17] ✕ ✓ ✕ ✕ IJCAI 2016 Scene text is first detected locally in individual frames and finally linked by an optimal tracking trajectory. Yang et al. [33] ✕ ✓ ✓ ✕ TIP 2017 A text detector will locate character candidates and extract text regions. Then they will linked by an optimal tracking trajectory. Liang et al. [8] ✕ ✓ ✓ ✓ TIP 2015 Exploring maxima stable extreme regions along with stroke width transform for detecting candidate text regions. Michal et al.[12] FASText ✕ ✓ ✓ ✕ ICCV 2015 Stroke keypoints are efficiently detected and then exploited to obtain stroke segmentations. 2.1.2 Segmentation-based Methods Method Model Code Hori Quad Irreg Source Time Highlight Li et al. [3] ✕ ✓ ✓ ✕ TIP 2014 (1)develop three novel cues that are tailored for character detection and a Bayesian method for their integration; (2)design a Markov random field model to exploit the inherent dependencies between characters. Zhang et al. [14] ✕ ✓ ✓ ✕ CVPR 2016 Utilizing FCN for salient map detection and centroid of each character prediction. Zhu et al. [16] ✕ ✓ ✓ ✕ CVPR 2016 Performs a graph-based segmentation of connected components into words (Word-Graph). He et al. [18] Text-CNN ✕ ✓ ✓ ✕ TIP 2016 Developing a new learning mechanism to train the Text-CNN with multi-level and rich supervised information. Yao et al. [21] ✕ ✓ ✓ ✕ arXiv 2016 Proposing to localize text in a holistic manner, by casting scene text detection as a semantic segmentation problem. Hu et al. [27] WordSup ✕ ✓ ✓ ✕ ICCV 2017 Proposing a weakly supervised framework that can utilize word annotations. Then the detected characters are fed to a text structure analysis module. Wu et al. [28] ✕ ✓ ✓ ✕ ICCV 2017 Introducing the border class to the text detection problem for the first time, and validate that the decoding process is largely simplified with the help of text border. Tang et al.[32] ✕ ✓ ✕ ✕ TIP 2017 A text-aware candidate text region(CTR) extraction model + CTR refinement model. Dai et al. [35] FTSN ✕ ✓ ✓ ✕ arXiv 2017 Detecting and segmenting the text instance jointly and simultaneously, leveraging merits from both semantic segmentation task and region proposal based object detection task. Wang et al. [38] ✕ ✓ ✕ ✕ ICDAR 2017 This paper proposes a novel character candidate extraction method based on super-pixel segmentation and hierarchical clustering. Deng et al. [40] PixelLink ✓ ✓ ✓ ✕ AAAI 2018 Text instances are first segmented out by linking pixels wthin the same instance together. Liu et al. [42] MCN ✕ ✓ ✓ ✕ CVPR 2018 Stochastic Flow Graph (SFG) + Markov Clustering. Lyu et al. [43] ✕ ✓ ✓ ✕ CVPR 2018 Detect scene text by localizing corner points of text bounding boxes and segmenting text regions in relative positions. Chu et al. [45] Border ✕ ✓ ✓ ✕ ECCV 2018 The paper presents a novel scene text detection technique that makes use of semantics-aware text borders and bootstrapping based text segment augmentation. Long et al. [46] TextSnake ✕ ✓ ✓ ✓ ECCV 2018 The paper proposes TextSnake, which is able to effectively represent text instances in horizontal, oriented and curved forms based on symmetry axis. Yang et al. [47] IncepText ✕ ✓ ✓ ✕ IJCAI 2018 Designing a novel Inception-Text module and introduce deformable PSROI pooling to deal with multi-oriented text detection. Yue et al. [48] ✕ ✓ ✓ ✕ BMVC 2018 Proposing a general framework for text detection called Guided CNN to achieve the two goals simultaneously. Zhong et al. [53] AF-RPN ✕ ✓ ✓ ✕ arXiv 2018 Presenting AF-RPN(anchor-free) as an anchor-free and scale-friendly region proposal network for the Faster R-CNN framework. Wang et al. [54] PSENet ✓ ✓ ✓ ✓ CVPR 2019 Proposing a novel Progressive Scale Expansion Network (PSENet), designed as a segmentation-based detector with multiple predictions for each text instance. Xu et al.[57] TextField ✕ ✓ ✓ ✓ arXiv 2018 Presenting a novel direction field which can represent scene texts of arbitrary shapes. Tian et al. [58] FTDN ✕ ✓ ✓ ✕ ICIP 2018 FTDN is able to segment text region and simultaneously regress text box at pixel-level. Tian et al. [83] ✕ ✓ ✓ ✓ CVPR 2019 Constraining embedding feature of pixels inside the same text region to share similar properties. Huang et al. [4] MSERs-CNN ✕ ✓ ✕ ✕ ECCV 2014 Combining MSERs with CNN Sun et al. [6] ✕ ✓ ✕ ✕ PR 2015 Presenting a robust text detection approach based on color-enhanced CER and neural networks. Baek et al. [62] CRAFT ✕ ✓ ✓ ✓ CVPR 2019 Proposing CRAFT effectively detect text area by exploring each character and affinity between characters. Richardson et al. [87] ✕ ✓ ✓ ✕ WACV 2019 Presenting an additional scale predictor the estimate the better scale of text regions for testing. Wang et al. [88] SAST ✕ ✓ ✓ ✓ ACMM 2019 Presenting a context attended multi-task learning framework for scene text detection. Wang et al. [90] PAN ✕ ✓ ✓ ✓ ICCV 2019 Proposing an efficient and accurate arbitrary-shaped text detector called Pixel Aggregation Network(PAN), 2.1.3 Regression-based Methods Method Model Code Hori Quad Irreg Source Time Highlight Gupta et al. [15] FCRN ✓ ✓ ✕ ✕ CVPR 2016 (a) Proposing a fast and scalable engine to generate synthetic images of text in clutter; (b) FCRN. Zhong et al. [20] DeepText ✕ ✓ ✕ ✕ arXiv 2016 (a) Inception-RPN; (b) Utilize ambiguous text category (ATC) information and multilevel region-of-interest pooling (MLRP). Liao et al. [22] TextBoxes ✓ ✓ ✕ ✕ AAAI 2017 Mainly basing SSD object detection framework. Liu et al. [25] DMPNet ✕ ✓ ✓ ✕ CVPR 2017 Quadrilateral sliding windows + shared Monte-Carlo method for fast and accurate computing of the polygonal areas + a sequential protocol for relative regression. He et al. [26] DDR ✕ ✓ ✓ ✕ ICCV 2017 Proposing an FCN that has bi-task outputs where one is pixel-wise classification between text and non-text, and the other is direct regression to determine the vertex coordinates of quadrilateral text boundaries. Jiang et al. [36] R2CNN ✕ ✓ ✓ ✕ arXiv 2017 Using the Region Proposal Network (RPN) to generate axis-aligned bounding boxes that enclose the texts with different orientations. Xing et al. [37] ArbiText ✕ ✓ ✓ ✕ arXiv 2017 Adopting the circle anchors and incorporating a pyramid pooling module into the Single Shot MultiBox Detector framework. Zhang et al. [39] FEN ✕ ✓ ✕ ✕ AAAI 2018 Proposing a refined scene text detector with a novel Feature Enhancement Network (FEN) for Region Proposal and Text Detection Refinement. Wang et al. [41] ITN ✕ ✓ ✓ ✕ CVPR 2018 ITN is presented to learn the geometry-aware representation encoding the unique geometric configurations of scene text instances with in-network transformation embedding. Liao et al. [44] RRD ✕ ✓ ✓ ✕ CVPR 2018 The regression branch extracts rotation-sensitive features, while the classification branch extracts rotation-invariant features by pooling the rotation sensitive features. Liao et al. [49] TextBoxes++ ✓ ✓ ✓ ✕ TIP 2018 Mainly basing SSD object detection framework and it replaces the rectangular box representation in conventional object detector by a quadrilateral or oriented rectangle representation. He et al. [50] ✕ ✓ ✓ ✕ TIP 2018 Proposing a scene text detection framework based on fully convolutional network with a bi-task prediction module. Ma et al. [51] RRPN ✓ ✓ ✓ ✕ TMM 2018 RRPN + RRoI Pooling. Zhu et al. [55] SLPR ✕ ✓ ✓ ✓ arXiv 2018 SLPR regresses multiple points on the edge of text line and then utilizes these points to sketch the outlines of the text. Deng et al. [56] ✓ ✓ ✓ ✕ arXiv 2018 CRPN employs corners to estimate the possible locations of text instances. And it also designs a embedded data augmentation module inside region-wise subnetwork. Cai et al. [59] FFN ✕ ✓ ✕ ✕ ICIP 2018 Proposing a Feature Fusion Network to deal with text regions differing in enormous sizes. Sabyasachi et al. [60] RGC ✕ ✓ ✓ ✕ ICIP 2018 Proposing a novel recurrent architecture to improve the learnings of a feature map at a given time. Liu et al. [63] CTD ✓ ✓ ✓ ✓ PR 2019 CTD + TLOC + PNMS Xie et al. [79] DeRPN ✓ ✓ ✕ ✕ AAAI 2019 DeRPN utilizes anchor string mechanism instead of anchor box in RPN. Wang et al. [82] ✕ ✓ ✓ ✓ CVPR 2019 Text-RPN + RNN Liu et al. [84] ✕ ✓ ✓ ✓ CVPR 2019 CSE mechanism He et al. [29] SSTD ✓ ✓ ✓ ✕ ICCV 2017 Proposing an attention mechanism. Then developing a hierarchical inception module which efficiently aggregates multi-scale inception features. Tian et al. [11] ✕ ✓ ✕ ✕ ICCV 2015 Cascade boosting detects character candidates, and the min-cost flow network model get the final result. Tian et al. [13] CTPN ✓ ✓ ✕ ✕ ECCV 2016 1) RPN + LSTM. 2) RPN incorporate a new vertical anchor mechanism and LSTM connects the region to get the final result. He et al. [19] ✕ ✓ ✓ ✕ ACCV 2016 ER detetctor detects regions to get coarse prediction of text regions. Then the local context is aggregated to classify the remaining regions to obtain a final prediction. Shi et al. [23] SegLink ✓ ✓ ✓ ✕ CVPR 2017 Decomposing text into segments and links. A link connects two adjacent segments. Tian et al. [30] WeText ✕ ✓ ✕ ✕ ICCV 2017 Proposing a weakly supervised scene text detection method (WeText). Zhu et al. [31] RTN ✕ ✓ ✕ ✕ ICDAR 2017 Mainly basing CTPN vertical vertical proposal mechanism. Ren et al. [34] ✕ ✓ ✕ ✕ TMM 2017 Proposing a CNN-based detector. It contains a text structure component detector layer, a spatial pyramid layer, and a multi-input-layer deep belief network (DBN). Zhang et al. [10] ✕ ✓ ✕ ✕ CVPR 2015 The proposed algorithm exploits the symmetry property of character groups and allows for direct extraction of text lines from natural images. Wang et al. [86] DSRN ✕ ✓ ✓ ✕ IJCAI 2019 Presenting a scale-transfer module and scale relationship module to handle the problem of scale variation. Tang et al.[89] Seglink++ ✕ ✓ ✓ ✓ PR 2019 Presenting instance aware component grouping (ICG) for arbitrary-shape text detection. Wang et al.[92] ContourNet ✓ ✓ ✓ ✓ CVPR 2020 1.A scale-insensitive Adaptive Region Proposal Network (AdaptiveRPN); 2. Local Orthogonal Texture-aware Module (LOTM). 2.1.4 Hybrid Methods Method Model Code Hori Quad Irreg Source Time Highlight Tang et al. [52] SSFT ✕ ✓ ✕ ✕ TMM 2018 Proposing a novel scene text detection method that involves superpixel-based stroke feature transform (SSFT) and deep learning based region classification (DLRC). Xie et al.[61] SPCNet ✕ ✓ ✓ ✓ AAAI 2019 Text Context module + Re-Score mechanism. Liu et al. [64] PMTD ✓ ✓ ✓ ✕ arXiv 2019 Perform “soft” semantic segmentation. It assigns a soft pyramid label (i.e., a real value between 0 and 1) for each pixel within text instance. Liu et al. [80] BDN ✓ ✓ ✓ ✕ IJCAI 2019 Discretizing bouding boxes into key edges to address label confusion for text detection. Zhang et al. [81] LOMO ✕ ✓ ✓ ✓ CVPR 2019 DR + IRM + SEM Zhou et al. [24] EAST ✓ ✓ ✓ ✕ CVPR 2017 The pipeline directly predicts words or text lines of arbitrary orientations and quadrilateral shapes in full images with instance segmentation. Yue et al. [48] ✕ ✓ ✓ ✕ BMVC 2018 Proposing a general framework for text detection called Guided CNN to achieve the two goals simultaneously. Zhong et al. [53] AF-RPN ✕ ✓ ✓ ✕ arXiv 2018 Presenting AF-RPN(anchor-free) as an anchor-free and scale-friendly region proposal network for the Faster R-CNN framework. Xue et al.[85] MSR ✕ ✓ ✓ ✓ IJCAI 2019 Presenting a noval multi-scale regression network. Liao et al. [91] DB ✓ ✓ ✓ ✓ AAAI 2020 Presenting differentiable binarization module to adaptively set the thresholds for binarization, which simplifies the post-processing. Xiao et al. [93] SDM ✕ ✓ ✓ ✓ ECCV 2020 1. A novel sequential deformation method; 2. auxiliary character counting supervision. 2.2 Detection Results2.2.1 Detection Results on Horizontal-Text Datasets Method Model Source Time Method Category IC11[68] IC13 [69] IC05[67] P R F P R F P R F Yao et al. [1] TD-Mixture CVPR 2012 Traditional ~ ~ ~ 0.69 0.66 0.67 ~ ~ ~ Yin et al. [2] TPAMI 2013 0.86 0.68 0.76 ~ ~ ~ ~ ~ ~ Yin et al. [7] TPAMI 2015 0.838 0.66 0.738 ~ ~ ~ ~ ~ ~ Wu et al. [9] TMM 2015 ~ ~ ~ 0.76 0.70 0.73 ~ ~ ~ Liang et al. [8] TIP 2015 0.77 0.68 0.71 0.76 0.68 0.72 ~ ~ ~ Michal et al.[12] FASText ICCV 2015 ~ ~ ~ 0.84 0.69 0.77 ~ ~ ~ Li et al. [3] TIP 2014 Segmentation 0.80 0.62 0.70 ~ ~ ~ ~ ~ ~ Zhang et al. [14] CVPR 2016 ~ ~ ~ 0.88 0.78 0.83 ~ ~ ~ He et al. [18] Text-CNN TIP 2016 0.91 0.74 0.82 0.93 0.73 0.82 0.87 0.73 0.79 Yao et al. [21] arXiv 2016 ~ ~ ~ 0.889 0.802 0.843 ~ ~ ~ Hu et al. [27] WordSup ICCV 2017 ~ ~ ~ 0.933 0.875 0.903 ~ ~ ~ Tang et al.[32] TIP 2017 0.90 0.86 0.88 0.92 0.87 0.89 ~ ~ ~ Wang et al. [38] ICDAR 2017 0.87 0.78 0.82 0.87 0.82 0.84 ~ ~ ~ Deng et al. [40] PixelLink AAAI 2018 ~ ~ ~ 0.886 0.875 0.881 ~ ~ ~ Liu et al. [42] MCN CVPR 2018 ~ ~ ~ 0.88 0.87 0.88 ~ ~ ~ Lyu et al. [43] CVPR 2018 ~ ~ ~ 0.92 0.844 0.880 ~ ~ ~ Chu et al. [45] Border ECCV 2018 ~ ~ ~ 0.915 0.871 0.892 ~ ~ ~ Wang et al. [54] PSENet CVPR 2019 ~ ~ ~ 0.94 0.90 0.92 ~ ~ ~ Huang et al. [4] MSERs-CNN ECCV 2014 0.88 0.71 0.78 ~ ~ ~ 0.84 0.67 0.75 Sun et al. [6] PR 2015 0.92 0.91 0.91 0.94 0.92 0.93 ~ ~ ~ Gupta et al. [15] FCRN CVPR 2016 Regression 0.94 0.77 0.85 0.938 0.764 0.842 ~ ~ ~ Zhong et al. [20] DeepText arXiv 2016 0.87 0.83 0.85 0.85 0.81 0.83 ~ ~ ~ Liao et al. [22] TextBoxes AAAI 2017 0.89 0.82 0.86 0.89 0.83 0.86 ~ ~ ~ Liu et al. [25] DMPNet CVPR 2017 ~ ~ ~ 0.93 0.83 0.870 ~ ~ ~ Jiang et al. [36] R2CNN arXiv 2017 ~ ~ ~ 0.92 0.81 0.86 ~ ~ ~ Xing et al. [37] ArbiText arXiv 2017 ~ ~ ~ 0.826 0.936 0.877 ~ ~ ~ Wang et al. [41] ITN CVPR 2018 0.896 0.889 0.892 0.941 0.893 0.916 ~ ~ ~ Liao et al. [49] TextBoxes++ TIP 2018 ~ ~ ~ 0.92 0.86 0.89 ~ ~ ~ He et al. [50] TIP 2018 ~ ~ ~ 0.91 0.84 0.88 ~ ~ ~ Ma et al. [51] RRPN TMM 2018 ~ ~ ~ 0.95 0.89 0.91 ~ ~ ~ Zhu et al. [55] SLPR arXiv 2018 ~ ~ ~ 0.90 0.72 0.80 ~ ~ ~ Cai et al. [59] FFN ICIP 2018 ~ ~ ~ 0.92 0.84 0.876 ~ ~ ~ Sabyasachi et al. [60] RGC ICIP 2018 ~ ~ ~ 0.89 0.77 0.83 ~ ~ ~ Wang et al. [82] CVPR 2019 ~ ~ ~ 0.937 0.878 0.907 ~ ~ ~ Liu et al. [84] CVPR 2019 ~ ~ ~ 0.937 0.897 0.917 ~ ~ ~ He et al. [29] SSTD ICCV 2017 ~ ~ ~ 0.89 0.86 0.88 ~ ~ ~ Tian et al. [11] ICCV 2015 0.86 0.76 0.81 0.852 0.759 0.802 ~ ~ ~ Tian et al. [13] CTPN ECCV 2016 ~ ~ ~ 0.93 0.83 0.88 ~ ~ ~ He et al. [19] ACCV 2016 ~ ~ ~ 0.90 0.75 0.81 ~ ~ ~ Shi et al. [23] SegLink CVPR 2017 ~ ~ ~ 0.877 0.83 0.853 ~ ~ ~ Tian et al. [30] WeText ICCV 2017 ~ ~ ~ 0.911 0.831 0.869 ~ ~ ~ Zhu et al. [31] RTN ICDAR 2017 ~ ~ ~ 0.94 0.89 0.91 ~ ~ ~ Ren et al. [34] TMM 2017 0.78 0.67 0.72 0.81 0.67 0.73 ~ ~ ~ Zhang et al. [10] CVPR 2015 0.84 0.76 0.80 0.88 0.74 0.80 ~ ~ ~ Tang et al. [52] SSFT TMM 2018 Hybrid 0.906 0.847 0.876 0.911 0.861 0.885 ~ ~ ~ Xie et al.[61] SPCNet AAAI 2019 ~ ~ ~ 0.94 0.91 0.92 ~ ~ ~ Liu et al. [80] BDN IJCAI 2019 ~ ~ ~ 0.887 0.894 0.89 ~ ~ ~ Zhou et al. [24] EAST CVPR 2017 ~ ~ ~ 0.93 0.83 0.870 ~ ~ ~ Yue et al. [48] BMVC 2018 ~ ~ ~ 0.885 0.846 0.870 ~ ~ ~ Zhong et al. [53] AF-RPN arXiv 2018 ~ ~ ~ 0.94 0.90 0.92 ~ ~ ~ Xue et al.[85] MSR IJCAI 2019 ~ ~ ~ 0.918 0.885 0.901 ~ ~ ~ 2.2.2 Detection Results on Arbitrary-Quadrilateral-Text Datasets Method Model Source Time Method Category IC15 [70] MSRA-TD500 [71] USTB-SV1K [65] SVT [66] P R F P R F P R F P R F Le et al. [5] HOCC CVPR 2014 Traditional ~ ~ ~ 0.71 0.62 0.66 ~ ~ ~ ~ ~ ~ Yin et al. [7] TPAMI 2015 ~ ~ ~ 0.81 0.63 0.71 0.499 0.454 0.475 ~ ~ ~ Wu et al. [9] TMM 2015 ~ ~ ~ 0.63 0.70 0.66 ~ ~ ~ ~ ~ ~ Tian et al. [17] IJCAI 2016 ~ ~ ~ 0.95 0.58 0.721 0.537 0.488 0.51 ~ ~ ~ Yang et al. [33] TIP 2017 ~ ~ ~ 0.95 0.58 0.72 0.54 0.49 0.51 ~ ~ ~ Liang et al. [8] TIP 2015 ~ ~ ~ 0.74 0.66 0.70 ~ ~ ~ ~ ~ ~ Zhang et al. [14] CVPR 2016 Segmentation 0.71 0.43 0.54 0.83 0.67 0.74 ~ ~ ~ ~ ~ ~ Zhu et al. [16] CVPR 2016 0.81 0.91 0.85 ~ ~ ~ ~ ~ ~ ~ ~ ~ He et al. [18] Text-CNN TIP 2016 ~ ~ ~ 0.76 0.61 0.69 ~ ~ ~ ~ ~ ~ Yao et al. [21] arXiv 2016 0.723 0.587 0.648 0.765 0.753 0.759 ~ ~ ~ ~ ~ ~ Hu et al. [27] WordSup ICCV 2017 0.793 0.77 0.782 ~ ~ ~ ~ ~ ~ ~ ~ ~ Wu et al. [28] ICCV 2017 0.91 0.78 0.84 0.77 0.78 0.77 ~ ~ ~ ~ ~ ~ Dai et al. [35] FTSN arXiv 2017 0.886 0.80 0.841 0.876 0.771 0.82 ~ ~ ~ ~ ~ ~ Deng et al. [40] PixelLink AAAI 2018 0.855 0.820 0.837 0.830 0.732 0.778 ~ ~ ~ ~ ~ ~ Liu et al. [42] MCN CVPR 2018 0.72 0.80 0.76 0.88 0.79 0.83 ~ ~ ~ ~ ~ ~ Lyu et al. [43] CVPR 2018 0.895 0.797 0.843 0.876 0.762 0.815 ~ ~ ~ ~ ~ ~ Chu et al. [45] Border ECCV 2018 ~ ~ ~ 0.830 0.774 0.801 ~ ~ ~ ~ ~ ~ Long et al. [46] TextSnake ECCV 2018 0.849 0.804 0.826 0.832 0.739 0.783 ~ ~ ~ ~ ~ ~ Yang et al. [47] IncepText IJCAI 2018 0.938 0.873 0.905 0.875 0.790 0.830 ~ ~ ~ ~ ~ ~ Wang et al. [54] PSENet CVPR 2019 0.8692 0.845 0.8569 ~ ~ ~ ~ ~ ~ ~ ~ ~ Xu et al.[57] TextField arXiv 2018 0.843 0.805 0.824 0.874 0.759 0.813 ~ ~ ~ ~ ~ ~ Tian et al. [58] FTDN ICIP 2018 0.847 0.773 0.809 ~ ~ ~ ~ ~ ~ ~ ~ ~ Tian et al. [83] CVPR 2019 0.883 0.850 0.866 0.842 0.817 0.829 ~ ~ ~ ~ ~ ~ Baek et al. [62] CRAFT CVPR 2019 0.898 0.843 0.869 0.882 0.782 0.829 ~ ~ ~ ~ ~ ~ Richardson et al. [87] IJCAI 2019 0.853 0.83 0.827 ~ ~ ~ ~ ~ ~ ~ ~ ~ Wang et al. [88] SAST ACMM 2019 0.8755 0.8734 0.8744 ~ ~ ~ ~ ~ ~ ~ ~ ~ Wang et al. [90] PAN ICCV 2019 0.84 0.819 0.829 0.844 0.838 0.821 ~ ~ ~ ~ ~ ~ Gupta et al. [15] FCRN CVPR 2016 Regression ~ ~ ~ ~ ~ ~ ~ ~ ~ 0.651 0.599 0.624 Liu et al. [25] DMPNet CVPR 2017 0.732 0.682 0.706 ~ ~ ~ ~ ~ ~ ~ ~ ~ He et al. [26] DDR ICCV 2017 0.82 0.80 0.81 0.77 0.70 0.74 ~ ~ ~ ~ ~ ~ Jiang et al. [36] R2CNN arXiv 2017 0.856 0.797 0.825 ~ ~ ~ ~ ~ ~ ~ ~ ~ Xing et al. [37] ArbiText arXiv 2017 0.792 0.735 0.759 0.78 0.72 0.75 ~ ~ ~ ~ ~ ~ Wang et al. [41] ITN CVPR 2018 0.857 0.741 0.795 0.903 0.723 0.803 ~ ~ ~ ~ ~ ~ Liao et al. [44] RRD CVPR 2018 0.88 0.8 0.838 0.876 0.73 0.79 ~ ~ ~ ~ ~ ~ Liao et al. [49] TextBoxes++ TIP 2018 0.878 0.785 0.829 ~ ~ ~ ~ ~ ~ ~ ~ ~ He et al. [50] TIP 2018 0.85 0.80 0.82 0.91 0.81 0.86 ~ ~ ~ ~ ~ ~ Ma et al. [51] RRPN TMM 2018 0.822 0.732 0.774 0.821 0.677 0.742 ~ ~ ~ ~ ~ ~ Zhu et al. [55] SLPR arXiv 2018 0.855 0.836 0.845 ~ ~ ~ ~ ~ ~ ~ ~ ~ Deng et al. [56] arXiv 2018 0.89 0.81 0.845 ~ ~ ~ ~ ~ ~ ~ ~ ~ Sabyasachi et al. [60] RGC ICIP 2018 0.83 0.81 0.82 0.85 0.76 0.80 ~ ~ ~ ~ ~ ~ Wang et al. [82] CVPR 2019 0.892 0.86 0.876 0.852 0.821 0.836 ~ ~ ~ ~ ~ ~ He et al. [29] SSTD ICCV 2017 0.80 0.73 0.77 ~ ~ ~ ~ ~ ~ ~ ~ ~ Tian et al. [13] CTPN ECCV 2016 0.74 0.52 0.61 ~ ~ ~ ~ ~ ~ ~ ~ ~ He et al. [19] ACCV 2016 ~ ~ ~ ~ ~ ~ ~ ~ ~ 0.87 0.73 0.79 Shi et al. [23] SegLink CVPR 2017 0.731 0.768 0.75 0.86 0.70 0.77 ~ ~ ~ ~ ~ ~ Wang et al. [86] DSRN IJCAI 2019 0.832 0.796 0.814 0.876 0.712 0.785 ~ ~ ~ ~ ~ ~ Tang et al.[89] Seglink++ PR 2019 0.837 0.803 0.820 ~ ~ ~ ~ ~ ~ ~ ~ ~ Wang et al. [92] ContourNet CVPR 2020 0.876 0.861 0.869 ~ ~ ~ ~ ~ ~ ~ ~ ~ Tang et al. [52] SSFT TMM 2018 Hybrid ~ ~ ~ ~ ~ ~ ~ ~ ~ 0.541 0.758 0.631 Xie et al.[61] SPCNet AAAI 2019 0.89 0.86 0.87 ~ ~ ~ ~ ~ ~ ~ ~ ~ Liu et al. [64] PMTD arXiv 2019 0.913 0.874 0.893 ~ ~ ~ ~ ~ ~ ~ ~ ~ Liu et al. [80] BDN IJCAI 2019 0.881 0.846 0.863 0.87 0.815 0.842 ~ ~ ~ ~ ~ ~ Zhang et al. [81] LOMO CVPR 2019 0.878 0.876 0.877 ~ ~ ~ ~ ~ ~ ~ ~ ~ Zhou et al. [24] EAST CVPR 2017 0.833 0.783 0.807 0.873 0.674 0.761 ~ ~ ~ ~ ~ ~ Yue et al. [48] BMVC 2018 0.866 0.789 0.823 ~ ~ ~ ~ ~ ~ 0.691 0.660 0.675 Zhong et al. [53] AF-RPN arXiv 2018 0.89 0.83 0.86 ~ ~ ~ ~ ~ ~ ~ ~ ~ Xue et al.[85] MSR IJCAI 2019 ~ ~ ~ 0.874 0.767 0.817 ~ ~ ~ ~ ~ ~ Liao et al. [91] DB AAAI 2020 0.918 0.832 0.873 0.915 0.792 0.849 ~ ~ ~ ~ ~ ~ Xiao et al. [93] SDM ECCV 2020 0.9196 0.8922 0.9057 ~ ~ ~ ~ ~ ~ ~ ~ ~ Method Model Source Time Method Category IC15 [70] MSRA-TD500 [71] USTB-SV1K [65] SVT [66] P R F P R F P R F P R F Le et al. [5] HOCC CVPR 2014 Traditional ~ ~ ~ ~ ~ ~ ~ ~ ~ 0.80 0.73 0.76 Yao et al. [21] arXiv 2016 Segmentation 0.432 0.27 0.333 ~ ~ ~ ~ ~ ~ ~ ~ ~ Hu et al. [27] WordSup ICCV 2017 0.452 0.309 0.368 ~ ~ ~ ~ ~ ~ ~ ~ ~ Lyu et al. [43] CVPR 2018 0.351 0.348 0.349 ~ ~ ~ 0.743 0.706 0.724 ~ ~ ~ Chu et al. [45] Border ECCV 2018 ~ ~ ~ 0.782 0.588 0.671 0.777 0.621 0.690 ~ ~ ~ Yang et al. [47] IncepText IJCAI 2018 ~ ~ ~ 0.785 0.569 0.660 ~ ~ ~ ~ ~ ~ Wang et al. [54] PSENet CVPR 2019 ~ ~ ~ ~ ~ ~ 0.7535 0.6918 0.7213 ~ ~ ~ Baek et al. [62] CRAFT CVPR 2019 ~ ~ ~ ~ ~ ~ 0.806 0.682 0.739 ~ ~ ~ He et al. [29] SSTD ICCV 2017 Regression 0.46 0.31 0.37 ~ ~ ~ ~ ~ ~ ~ ~ ~ Gupta et al. [15] FCRN CVPR 2016 ~ ~ ~ ~ ~ ~ 0.844 0.763 0.801 ~ ~ ~ Liao et al. [49] TextBoxes++ TIP 2018 0.61 0.57 0.59 ~ ~ ~ ~ ~ ~ ~ ~ ~ Ma et al. [51] RRPN TMM 2018 ~ ~ ~ ~ ~ ~ 0.7669 0.5794 0.6601 ~ ~ ~ Deng et al. [56] arXiv 2018 0.555 0.633 0.591 ~ ~ ~ ~ ~ ~ ~ ~ ~ Cai et al. [59] FFN ICIP 2018 0.43 0.35 0.39 ~ ~ ~ ~ ~ ~ ~ ~ ~ Xie et al. [79] DeRPN AAAI 2019 0.586 0.557 0.571 ~ ~ ~ ~ ~ ~ ~ ~ ~ He et al. [29] SSTD ICCV 2017 0.46 0.31 0.37 ~ ~ ~ ~ ~ ~ ~ ~ ~ Liao et al. [44] RRD CVPR 2018 ~ ~ ~ 0.591 0.775 0.670 ~ ~ ~ ~ ~ ~ Richardson et al. [87] IJCAI 2019 ~ ~ ~ ~ ~ ~ 0.729 0.618 0.669 ~ ~ ~ Wang et al. [88] SAST ACMM 2019 ~ ~ ~ ~ ~ ~ 0.7935 0.6653 0.7237 ~ ~ ~ Xie et al.[61] SPCNet AAAI 2019 Hybrid ~ ~ ~ ~ ~ ~ 0.806 0.686 0.741 ~ ~ ~ Liu et al. [64] PMTD arXiv 2019 ~ ~ ~ ~ ~ ~ 0.844 0.763 0.801 ~ ~ ~ Liu et al. [80] BDN IJCAI 2019 ~ ~ ~ ~ ~ ~ 0.791 0.698 0.742 ~ ~ ~ Zhang et al. [81] LOMO CVPR 2019 ~ ~ ~ 0.791 0.602 0.684 0.802 0.672 0.731 ~ ~ ~ Zhou et al. [24] EAST CVPR 2017 0.504 0.324 0.395 ~ ~ ~ ~ ~ ~ ~ ~ ~ Zhong et al. [53] AF-RPN arXiv 2018 ~ ~ ~ ~ ~ ~ 0.75 0.66 0.70 ~ ~ ~ Liao et al. [91] DB AAAI 2020 ~ ~ ~ ~ ~ ~ 0.831 0.679 0.747 ~ ~ ~ Xiao et al. [93] SDM ECCV 2020 ~ ~ ~ ~ ~ ~ 0.8679 0.7526 0.8061 ~ ~ ~ 2.2.3 Detection Results on Irregular-Text DatasetsIn this section, we only select those methods suitable for irregular text detection. Method Model Source Time Method Category Total-text [74] SCUT-CTW1500 [75] P R F P R F Baek et al. [62] CRAFT CVPR 2019 Segmentation 0.876 0.799 0.836 0.860 0.811 0.835 Long et al. [46] TextSnake ECCV 2018 0.827 0.745 0.784 0.679 0.853 0.756 Tian et al. [83] CVPR 2019 ~ ~ ~ 81.7 84.2 80.1 Wang et al. [54] PSENet CVPR 2019 0.840 0.779 0.809 0.848 0.797 0.822 Wang et al. [88] SAST ACMM 2019 0.8557 0.7549 0.802 0.8119 0.8171 0.8145 Wang et al. [90] PAN ICCV 2019 0.893 0.81 0.85 0.864 0.812 0.837 Zhu et al. [55] SLPR arXiv 2018 Regression ~ ~ ~ 0.801 0.701 0.748 Liu et al. [63] CTD+TLOC PR 2019 ~ ~ ~ 0.774 0.698 0.734 Wang et al. [82] CVPR 2019 ~ ~ ~ 80.1 80.2 80.1 Liu et al. [84] CVPR 2019 0.814 0.791 0.802 0.787 0.761 0.774 Tang et al.[89] Seglink++ PR 2019 0.829 0.809 0.815 0.828 0.798 0.813 Wang et al. [92] ContourNet CVPR 2020 0.869 0.839 0.854 0.837 0.841 0.839 Zhang et al. [81] LOMO CVPR 2019 Hybrid 0.876 0.793 0.833 0.857 0.765 0.808 Xie et al.[61] SPCNet AAAI 2019 0.83 0.83 0.83 ~ ~ ~ Xue et al.[85] MSR IJCAI 2019 0.852 0.73 0.768 0.838 0.778 0.807 Liao et al. [91] DB AAAI 2020 0.871 0.825 0.847 0.869 0.802 0.834 Xiao et al.[93] SDM ECCV 2020 0.9085 0.8603 0.8837 0.884 0.8442 0.8636 3. Survey[A] [TPAMI-2015] Ye Q, Doermann D. Text detection and recognition in imagery: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(7): 1480-1500. paper[B] [Frontiers-Comput. Sci-2016] Zhu Y, Yao C, Bai X. Scene text detection and recognition: Recent advances and future trends[J]. Frontiers of Computer Science, 2016, 10(1): 19-36. paper[C] [arXiv-2018] Long S, He X, Ya C. Scene Text Detection and Recognition: The Deep Learning Era[J]. arXiv preprint arXiv:1811.04256, 2018. paper4. EvaluationIf you are insterested in developing better scene text detection metrics, some references recommended here might be useful.[A] Wolf, Christian, and Jean-Michel Jolion. "Object count/area graphs for the evaluation of object detection and segmentation algorithms." International Journal of Document Analysis and Recognition (IJDAR) 8.4 (2006): 280-296. paper[B] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. K. Ghosh, A. D.Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny. ICDAR 2015 competition on robust reading. In ICDAR, pages 1156–1160, 2015. paper[C] Calarasanu, Stefania, Jonathan Fabrizio, and Severine Dubuisson. "What is a good evaluation protocol for text localization systems? Concerns, arguments, comparisons and solutions." Image and Vision Computing 46 (2016): 1-17. paper[D] Shi, Baoguang, et al. "ICDAR2017 competition on reading chinese text in the wild (RCTW-17)." 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Vol. 1. IEEE, 2017. paper[E] Nayef, N; Yin, F; Bizid, I; et al. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on, volume 1, 1454–1459. IEEE.paper[F] Dangla, Aliona, et al. "A first step toward a fair comparison of evaluation protocols for text detection algorithms." 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE, 2018. paper[G] He,Mengchao and Liu, Yuliang, et al. ICPR2018 Contest on Robust Reading for Multi-Type Web images. ICPR 2018. paper[H] Liu, Yuliang and Jin, Lianwen, et al. "Tightness-aware Evaluation Protocol for Scene Text Detection" Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2019. paper code5. OCR ServiceOCRAPIFreeTesseract OCR Engine×√Azure√√ABBYY√√OCR Space√√SODA PDF OCR√√Free Online OCR√√Online OCR√√Super Tools√√Online Chinese Recognition√√Calamari OCR×√Tencent OCR√×6. References and Code [1] Yao C, Bai X, Liu W, et al. Detecting texts of arbitrary orientations in natural images. 2012 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2012: 1083-1090. Paper[2] Yin X C, Yin X, Huang K, et al. Robust text detection in natural scene images. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2013, 36(5): 970-83. Paper[3] Li Y, Jia W, Shen C, et al. Characterness: An indicator of text in the wild. IEEE transactions on image processing, 2014, 23(4): 1666-1677. Paper[4] Huang W, Qiao Y, Tang X. Robust scene text detection with convolution neural network induced mser trees. European Conference on Computer Vision(ECCV), 2014: 497-511. Paper[5] Kang L, Li Y, Doermann D. Orientation robust text line detection in natural images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 4034-4041. Paper[6] Sun L, Huo Q, Jia W, et al. A robust approach for text detection from natural scene images. Pattern Recognition, 2015, 48(9): 2906-2920. Paper[7] Yin X C, Pei W Y, Zhang J, et al. Multi-orientation scene text detection with adaptive clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2015 (9): 1930-1937. Paper[8] Liang G, Shivakumara P, Lu T, et al. Multi-spectral fusion based approach for arbitrarily oriented scene text detection in video images. IEEE Transactions on Image Processing, 2015, 24(11): 4488-4501. Paper[9] Wu L, Shivakumara P, Lu T, et al. A New Technique for Multi-Oriented Scene Text Line Detection and Tracking in Video. IEEE Trans. Multimedia, 2015, 17(8): 1137-1152. Paper[10] Zheng Z, Wei S, et al. Symmetry-based text line detection in natural scenes. IEEE Conference on Computer Vision & Pattern Recognition(CVPR), 2015. Paper[11] Tian S, Pan Y, Huang C, et al. Text flow: A unified text detection system in natural scene images. Proceedings of the IEEE international conference on computer vision(ICCV). 2015: 4651-4659. Paper[12] Buta M, et al. FASText: Efficient unconstrained scene text detector. 2015 IEEE International Conference on Computer Vision (ICCV). 2015: 1206-1214. Paper[13] Tian Z, Huang W, He T, et al. Detecting text in natural image with connectionist text proposal network. European conference on computer vision(ECCV), 2016: 56-72. Paper Code[14] Zhang Z, Zhang C, Shen W, et al. Multi-oriented text detection with fully convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2016: 4159-4167. Paper[15] Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2016: 2315-2324. Paper Code[16] S. Zhu and R. Zanibbi, A Text Detection System for Natural Scenes with Convolutional Feature Learning and Cascaded Classification, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 625-632. Paper[17] Tian S, Pei W Y, Zuo Z Y, et al. Scene Text Detection in Video by Learning Locally and Globally. IJCAI. 2016: 2647-2653. Paper[18] He T, Huang W, Qiao Y, et al. Text-attentional convolutional neural network for scene text detection. IEEE transactions on image processing, 2016, 25(6): 2529-2541. Paper[19] He, Dafang and Yang, Xiao and Huang, Wenyi and Zhou, Zihan and Kifer, Daniel and Giles, C Lee. Aggregating local context for accurate scene text detection. ACCV, 2016. Paper[20] Zhong Z, Jin L, Zhang S, et al. Deeptext: A unified framework for text proposal generation and text detection in natural images. arXiv preprint arXiv:1605.07314, 2016. Paper[21] Yao C, Bai X, Sang N, et al. Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002, 2016. Paper[22] Liao M, Shi B, Bai X, et al. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. AAAI. 2017: 4161-4167. Paper Code[23] Shi B, Bai X, Belongie S. Detecting Oriented Text in Natural Images by Linking Segments. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 3482-3490. Paper Code[24] Zhou X, Yao C, Wen H, et al. EAST: an efficient and accurate scene text detector. CVPR, 2017: 2642-2651. Paper Code[25] Liu Y, Jin L. Deep matching prior network: Toward tighter multi-oriented text detection. CVPR, 2017: 3454-3461. Paper[26] He W, Zhang X Y, Yin F, et al. Deep Direct Regression for Multi-Oriented Scene Text Detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2017: 745-753. Paper[27] Hu H, Zhang C, Luo Y, et al. Wordsup: Exploiting word annotations for character based text detection. ICCV, 2017. Paper[28] Wu Y, Natarajan P. Self-organized text detection with minimal post-processing via border learning. ICCV, 2017. Paper[29] He P, Huang W, He T, et al. Single shot text detector with regional attention. The IEEE International Conference on Computer Vision (ICCV). 2017, 6(7). Paper Code[30] Tian S, Lu S, Li C. Wetext: Scene text detection under weak supervision. ICCV, 2017. Paper[31] Zhu, Xiangyu and Jiang, Yingying et al. Deep Residual Text Detection Network for Scene Text. ICDAR, 2017. Paper[32] Tang Y , Wu X. Scene Text Detection and Segmentation Based on Cascaded Convolution Neural Networks. IEEE Transactions on Image Processing, 2017, 26(3):1509-1520. Paper[33] Yang C, Yin X C, Pei W Y, et al. Tracking Based Multi-Orientation Scene Text Detection: A Unified Framework with Dynamic Programming. IEEE Transactions on Image Processing, 2017. Paper[34] X. Ren, Y. Zhou, J. He, K. Chen, X. Yang and J. Sun, A Convolutional Neural Network-Based Chinese Text Detection Algorithm via Text Structure Modeling. in IEEE Transactions on Multimedia, vol. 19, no. 3, pp. 506-518, March 2017. Paper[35] Dai Y, Huang Z, Gao Y, et al. Fused text segmentation networks for multi-oriented scene text detection. arXiv preprint arXiv:1709.03272, 2017. Paper[36] Jiang Y, Zhu X, Wang X, et al. R2CNN: rotational region CNN for orientation robust scene text detection. arXiv preprint arXiv:1706.09579, 2017. Paper[37] Xing D, Li Z, Chen X, et al. ArbiText: Arbitrary-Oriented Text Detection in Unconstrained Scene. arXiv preprint arXiv:1711.11249, 2017. Paper[38] C. Wang, F. Yin and C. Liu, Scene Text Detection with Novel Superpixel Based Character Candidate Extraction. in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017, pp. 929-934. Paper[39] Sheng Zhang, Yuliang Liu, Lianwen Jin et al. Feature Enhancement Network: A Refined Scene Text Detector. In AAAI 2018. Paper[40] Dan Deng et al. PixelLink: Detecting Scene Text via Instance Segmentation. In AAAI 2018. Paper Code[41] Fangfang Wang, Liming Zhao, Xi L et al. Geometry-Aware Scene Text Detection with Instance Transformation Network. In CVPR 2018. Paper[42] Zichuan Liu, Guosheng Lin, Sheng Yang et al. Learning Markov Clustering Networks for Scene Text Detection. In CVPR 2018. Paper[43] Pengyuan Lyu, Cong Yao, Wenhao Wu et al. Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation. In CVPR 2018. Paper[44] Minghui L, Zhen Z, Baoguang S. Rotation-Sensitive Regression for Oriented Scene Text Detection. In CVPR 2018. Paper[45] Chuhui Xue et al. Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping. In ECCV 2018. Paper[46] Long, Shangbang and Ruan, Jiaqiang, et al. TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes. In ECCV, 2018. Paper[47] Qiangpeng Yang, Mengli Cheng et al. IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection. In IJCAI 2018. Paper[48] Xiaoyu Yue et al. Boosting up Scene Text Detectors with Guided CNN. In BMVC 2018. Paper[49] Liao M, Shi B , Bai X. TextBoxes++: A Single-Shot Oriented Scene Text Detector. IEEE Transactions on Image Processing, 2018, 27(8):3676-3690. Paper Code[50] W. He, X. Zhang, F. Yin and C. Liu, Multi-Oriented and Multi-Lingual Scene Text Detection With Direct Regression, in IEEE Transactions on Image Processing, vol. 27, no. 11, pp.5406-5419, 2018. Paper[51] Ma J, Shao W, Ye H, et al. Arbitrary-oriented scene text detection via rotation proposals.in IEEE Transactions on Multimedia, 2018. Paper Code[52] Youbao Tang and Xiangqian Wu. Scene Text Detection Using Superpixel-Based Stroke Feature Transform and Deep Learning Based Region Classification. In TMM, 2018. Paper[53] Zhuoyao Zhong, Lei Sun and Qiang Huo. An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches. arXiv preprint arXiv:1804.09003. 2018. Paper[54] Wenhai W, Enze X, et al. Shape Robust Text Detection with Progressive Scale Expansion Network. In CVPR 2019. Paper Code[55] Zhu Y, Du J. Sliding Line Point Regression for Shape Robust Scene Text Detection. arXiv preprint arXiv:1801.09969, 2018. Paper[56] Linjie D, Yanxiang Gong, et al. Detecting Multi-Oriented Text with Corner-based Region Proposals. arXiv preprint arXiv: 1804.02690, 2018. Paper Code[57] Yongchao Xu, Yukang Wang, Wei Zhou, et al. TextField: Learning A Deep Direction Field for Irregular Scene Text Detection. arXiv preprint arXiv: 1812.01393, 2018. Paper[58] Xiaowei Tian, Dao Wu, Rui Wang, Xiaochun Cao. Focal Text: an Accurate Text Detection with Focal Loss. In ICIP 2018. Paper[59] Chenqin C, Pin L, Bing S. Feature Fusion Network for Scene Text Detection. In ICIP, 2018. Paper[60] Sabyasachi Mohanty et al. Recurrent Global Convolutional Network for Scene Text Detection. In ICIP 2018. Paper[61] Enze Xie, et al. Scene Text Detection with Supervised Pyramid Context Network. In AAAI 2019. Paper[62] Youngmin Baek, Bado Lee, et al. Character Region Awareness for Text Detection. In CVPR 2019. Paper[63] Yuliang L, Lianwen J, Shuaitao Z, et al. Curved Scene Text Detection via Transverse and Longitudinal Sequence Connection. Pattern Recognition, 2019. Paper Code[64] Jingchao Liu, Xuebo Liu, et al, Pyramid Mask Text Detector. arXiv preprint arXiv:1903.11800, 2019. Paper Code[79] Lele Xie, Yuliang Liu, Lianwen Jin, Zecheng Xie, DeRPN: Taking a further step toward more general object detection. In AAAI, 2019. Paper Code[80] Yuliang Liu, Lianwen Jin, et al, Omnidirectional Scene Text Detction with Sequential-free Box Discretization. In IJCAI, 2019.Paper Code[81] Chengquan Zhang, Borong Liang, et al, Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes. In CVPR, 2019.Paper[82] Xiaobing Wang, Yingying Jiang, et al, Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation. In CVPR, 2019. Paper[83] Zhuotao Tian, Michelle Shu, et al, Learning Shape-Aware Embedding for Scene Text Detection. In CVPR, 2019. Paper[84] Zichuan Liu, Guosheng Lin, et al, Towards Robust Curve Text Detection with Conditional Spatial Expansion. In CVPR, 2019. Paper[85] Xue C, Lu S, Zhang W. MSR: multi-scale shape regression for scene text detection. In IJCAI, 2019. Paper[86] Wang Y, Xie H, Fu Z, et al. DSRN: a deep scale relationship network for scene text detection. In IJCAI, 2019: 947-953. Paper[87] Elad Richardson, et al, It's All About The Scale -- Efficient Text Detection Using Adaptive Scaling. In WACV, 2020. Paper[88] Pengfei Wang, et al, A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning. In ACMM, 2019. Paper[89] Jun Tang, et al, SegLink ++: Detecting Dense and Arbitrary-shaped Scene Text by Instance-aware Component Grouping. In PR, 2019. Paper[90] Wenhai Wang, et al, Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network. In ICCV, 2019. Paper[91] Minghui Liao, et al, Real-time Scene Text Detection with Differentiable Binarization. In AAAI, 2020. PaperCode[92] Wang, Yuxin, et al. ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene Text Detection. CVPR. 2020. PaperCode[93] Xiao, et al, Sequential Deformation for Accurate Scene Text Detection. In ECCV, 2020. Paper DatasetsUSTB-SV1K[65]:Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, and Hong-Wei Hao, Robust text detection in natural scene images, IEEE Trans. Pattern Analysis and Machine Intelligence (TPAMI), priprint, 2013. PaperSVT[66]: Wang,Kai, and S. Belongie. Word Spotting in the Wild. European Conference on Computer Vision(ECCV), 2010: 591-604. PaperICDAR2005[67]: Lucas, S: ICDAR 2005 text locating competition results. In: ICDAR ,2005. PaperICDAR2011[68]: Shahab, A, Shafait, F, Dengel, A: ICDAR 2011 robust reading competition challenge 2: Reading text in scene images. In: ICDAR, 2011. PaperICDAR2013[69]:D. Karatzas, F. Shafait, S. Uchida, et al. ICDAR 2013 robust reading competition. In ICDAR, 2013. PaperICDAR2015[70]:D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. K. Ghosh, A. D.Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny. ICDAR 2015 competition on robust reading. In ICDAR, pages 1156–1160, 2015. PaperMSRA-TD500[71]:C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, Detecting texts of arbitrary orientations in natural images. in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012, pp.1083–1090.PaperCOCO-Text[72]:Veit A, Matera T, Neumann L, et al. Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140, 2016. PaperRCTW-17[73]:Shi B, Yao C, Liao M, et al. ICDAR2017 competition on reading chinese text in the wild (RCTW-17). Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on. IEEE, 2017, 1: 1429-1434. PaperTotal-Text[74]:Chee C K, Chan C S. Total-text: A comprehensive dataset for scene text detection and recognition.Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on. IEEE, 2017, 1: 935-942.PaperSCUT-CTW1500[75]:Yuliang L, Lianwen J, Shuaitao Z, et al. Curved Scene Text Detection via Transverse and Longitudinal Sequence Connection. Pattern Recognition, 2019.PaperMLT 2017[76]: Nayef, N; Yin, F; Bizid, I; et al. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on, volume 1, 1454–1459. IEEE. PaperOSTD[77]: Chucai Yi and YingLi Tian, Text string detection from natural scenes by structure-based partition and grouping, In IEEE Transactions on Image Processing, vol. 20, no. 9, pp. 2594–2605, 2011. PaperCTW[78]: Yuan T L, Zhu Z, Xu K, et al. Chinese Text in the Wild. arXiv preprint arXiv:1803.00085, 2018. PaperIf you find any problems in our resources, or any good papers/codes we have missed, please inform us at liuchongyu1996@gmail.com. Thank you for your contribution.CopyrightCopyright © 2019 SCUT-DLVC. All Rights Reserved.
2021年03月30日
714 阅读
0 评论
0 点赞
2021-02-28
ubuntu安装和配置aria2
ubuntu安装和配置aria2安装 sudo apt-get install aria2配置sudo mkdir /etc/aria2 #新建文件夹 sudo touch /etc/aria2/aria2.session #新建session文件 sudo chmod 777 /etc/aria2/aria2.session #设置aria2.session可写 sudo vim /etc/aria2/aria2.conf #创建配置文件配置文件内容# 文件的保存路径(可使用绝对路径或相对路径), 默认: 当前启动位置 dir=/mnt/sdb1/download # 启用磁盘缓存, 0为禁用缓存, 需1.16以上版本, 默认:16M disk-cache=32M # 文件预分配方式, 能有效降低磁盘碎片, 默认:prealloc # 预分配所需时间: none < falloc ? trunc < prealloc # falloc和trunc则需要文件系统和内核支持 # NTFS建议使用falloc, EXT3/4建议trunc, MAC 下需要注释此项 file-allocation=trunc # 断点续传 continue=true ## 下载连接相关 ## # 最大同时下载任务数, 运行时可修改, 默认:5 max-concurrent-downloads=10 # 同一服务器连接数, 添加时可指定, 默认:1 max-connection-per-server=16 # 最小文件分片大小, 添加时可指定, 取值范围1M -1024M, 默认:20M # 假定size=10M, 文件为20MiB 则使用两个来源下载; 文件为15MiB 则使用一个来源下载 min-split-size=10M # 单个任务最大线程数, 添加时可指定, 默认:5 split=16 # 整体下载速度限制, 运行时可修改, 默认:0 max-overall-download-limit=0 # 单个任务下载速度限制, 默认:0 #max-download-limit=0 # 整体上传速度限制, 运行时可修改, 默认:0 max-overall-upload-limit=80 # 单个任务上传速度限制, 默认:0 #max-upload-limit=1000 # 禁用IPv6, 默认:false disable-ipv6=false #检查证书 check-certificate=false ## 进度保存相关 ## # 从会话文件中读取下载任务 input-file=/etc/aria2/aria2.session # 在Aria2退出时保存`错误/未完成`的下载任务到会话文件 save-session=/etc/aria2/aria2.session # 定时保存会话, 0为退出时才保存, 需1.16.1以上版本, 默认:0 #save-session-interval=60 ## RPC相关设置 ## # 启用RPC, 默认:false enable-rpc=true # 允许所有来源, 默认:false rpc-allow-origin-all=true # 允许非外部访问, 默认:false rpc-listen-all=true # 事件轮询方式, 取值:[epoll, kqueue, port, poll, select], 不同系统默认值不同 #event-poll=select # RPC监听端口, 端口被占用时可以修改, 默认:6800 rpc-listen-port=6800 # 设置的RPC授权令牌, v1.18.4新增功能, 取代 --rpc-user 和 --rpc-passwd 选项 rpc-secret=123456 # 设置的RPC访问用户名, 此选项新版已废弃, 建议改用 --rpc-secret 选项 #rpc-user=<USER> # 设置的RPC访问密码, 此选项新版已废弃, 建议改用 --rpc-secret 选项 #rpc-passwd=<PASSWD> ## BT/PT下载相关 ## # 当下载的是一个种子(以.torrent结尾)时, 自动开始BT任务, 默认:true follow-torrent=true # BT监听端口, 当端口被屏蔽时使用, 默认:6881-6999 #listen-port=51413 # 单个种子最大连接数, 默认:55 #bt-max-peers=55 # 打开DHT功能, PT需要禁用, 默认:true enable-dht=true # 打开IPv6 DHT功能, PT需要禁用 #enable-dht6=false # DHT网络监听端口, 默认:6881-6999 #dht-listen-port=6881-6999 # 本地节点查找, PT需要禁用, 默认:false #bt-enable-lpd=true # 种子交换, PT需要禁用, 默认:true enable-peer-exchange=true # 每个种子限速, 对少种的PT很有用, 默认:50K #bt-request-peer-speed-limit=50K # 客户端伪装, PT需要 peer-id-prefix=-TR2770- user-agent=Transmission/2.77 # 当种子的分享率达到这个数时, 自动停止做种, 0为一直做种, 默认:1.0 seed-ratio=0.1 # 强制保存会话, 即使任务已经完成, 默认:false # 较新的版本开启后会在任务完成后依然保留.aria2文件 #force-save=false # BT校验相关, 默认:true #bt-hash-check-seed=true # 继续之前的BT任务时, 无需再次校验, 默认:false bt-seed-unverified=true # 保存磁力链接元数据为种子文件(.torrent文件), 默认:false bt-save-metadata=true bt-tracker=udp://tracker.coppersurfer.tk:6969/announce,udp://tracker.opentrackr.org:1337/announce,udp://tracker.leechers-paradise.org:6969/announce,udp://p4p.arenabg.com:1337/announce,udp://9.rarbg.to:2710/announce,udp://9.rarbg.me:2710/announce,udp://tracker.internetwarriors.net:1337/announce,udp://exodus.desync.com:6969/announce,udp://tracker.cyberia.is:6969/announce,udp://retracker.lanta-net.ru:2710/announce,udp://tracker.tiny-vps.com:6969/announce,udp://open.stealth.si:80/announce,udp://tracker.torrent.eu.org:451/announce,udp://tracker3.itzmx.com:6961/announce,udp://tracker.moeking.me:6969/announce,udp://bt1.archive.org:6969/announce,udp://ipv4.tracker.harry.lu:80/announce,udp://bt2.archive.org:6969/announce,udp://zephir.monocul.us:6969/announce,udp://valakas.rollo.dnsabr.com:2710/announce,udp://tracker.zerobytes.xyz:1337/announce,udp://tracker.uw0.xyz:6969/announce,udp://tracker.lelux.fi:6969/announce,udp://tracker.kamigami.org:2710/announce,udp://tracker.ds.is:6969/announce,udp://retracker.akado-ural.ru:80/announce,udp://opentracker.i2p.rocks:6969/announce,udp://opentor.org:2710/announce,udp://explodie.org:6969/announce,udp://tracker-udp.gbitt.info:80/announce,udp://chihaya.de:6969/announce,udp://www.loushao.net:8080/announce,udp://u.wwwww.wtf:1/announce,udp://tracker.yoshi210.com:6969/announce,udp://tracker.teambelgium.net:6969/announce,udp://tracker.swateam.org.uk:2710/announce,udp://tracker.skyts.net:6969/announce,udp://tracker.jae.moe:6969/announce,udp://tracker.army:6969/announce,udp://tr2.ysagin.top:2710/announce,udp://t3.leech.ie:1337/announce,udp://t1.leech.ie:1337/announce,udp://retracker.sevstar.net:2710/announce,udp://retracker.netbynet.ru:2710/announce,udp://qg.lorzl.gq:2710/announce,udp://aaa.army:8866/announce,udp://tracker6.dler.org:2710/announce,udp://tracker4.itzmx.com:2710/announce,udp://tracker2.itzmx.com:6961/announce,udp://tracker.filemail.com:6969/announce,udp://tracker.dler.org:6969/announce,udp://tr.bangumi.moe:6969/announce,udp://bt2.54new.com:8080/announce配置系统服务vim /etc/systemd/system/aria2c.service[Unit] Description=Aria2c [Service] TimeoutStartSec=0 ExecStart=/usr/bin/aria2c --conf-path=/etc/aria2/aria2.conf ExecReload=/bin/kill -HUP $MAINPID KillMode=process Restart=on-failure RestartSec=42s [Install] WantedBy=multi-user.target启动与开机自启# 更新配置 systemctl daemon-reload # 启动服务 systemctl start aria2c # 设置开机启动 systemctl enable aria2c参考资料ubuntu安装aria2:https://www.jianshu.com/p/1178a669c308ubuntu Aria2 AriaNg安装:https://blog.csdn.net/macwinwin/article/details/106985341
2021年02月28日
1,190 阅读
0 评论
0 点赞
2021-02-28
cloudreve私有云盘配置后台运行
cloudreve私有云盘配置后台运行在/usr/lib/systemd/system目录下创建一个服务cloudreve.servicesudo vim /etc/systemd/system/cloudreve.service输入如下内容:[Unit] Description=Cloudreve Documentation=https://docs.cloudreve.org [Service] TimeoutStartSec=0 ExecStart=/data/cloudreve/cloudreve ExecReload=/bin/kill -HUP $MAINPID KillMode=process Restart=on-failure RestartSec=5s [Install] WantedBy=multi-user.target更新重启服务# 更新配置 systemctl daemon-reload # 启动服务 systemctl start cloudreve # 设置开机启动 systemctl enable cloudreve 管理命令# 启动服务 systemctl start cloudreve # 停止服务 systemctl stop cloudreve # 重启服务 systemctl restart cloudreve # 查看状态 systemctl status cloudreve参考资料cloudreve私有云盘配置后台运行:https://blog.csdn.net/longzhoufeng/article/details/108958091
2021年02月28日
2,076 阅读
0 评论
0 点赞
2021-02-28
Ubuntu修改dns服务器
Ubuntu修改dns服务器问题背景域名DNS解析异常。解决方案编辑DNS配置文件sudo vim /etc/resolv.conf在文件中增加如下内容nameserver 114.114.114.114 nameserver 8.8.8.8参考资料curl: (6) Could not resolve host: www.baidu.com:https://blog.csdn.net/qq_32440951/article/details/80825259
2021年02月28日
660 阅读
0 评论
0 点赞
2021-02-20
python实现阿里云域名ipv4和ipv6 ddns
1.前言首先得有一个阿里云的域名:https://www.aliyun.com/minisite/goods?userCode=jdjc69nf然后你的IP必须是公网IP,不然解析了也没用。本文章讲怎样通过阿里云的SDK来添加修改域名解析,检查本机IP与解析的IP是否一致,不一致自动修改解析,达到动态解析的目的,主要用于家庭宽带这些动态IP的地方。2.安装阿里云SDK和其他第三方库pip install aliyun-python-sdk-core-v3 pip install aliyun-python-sdk-domain pip install aliyun-python-sdk-alidns pip install requests3.获取accessKeyId和accessSecret可以在阿里云控制台个人中心直接获取,但是一般建议使用RAM角色来进行权限控制,这样这个accessKey和accessSecret就只能操作域名,不能操作其他的资源,相对会比较安全。关于RAM快速入门:https://help.aliyun.com/document_detail/28637.html?source=5176.11533457&userCode=jdjc69nf4.源码下载-官方4.1 下载地址gitee:https://gitee.com/zeruns/aliddns_Pythongithub:https://github.com/zeruns/-Python-aliddns_ipv4-ipv6将aliddns.py文件下载下来。然后用notepad++或其他编辑器打开,按照注释提示修改并保存。然后运行一下看看有没有问题:打开cmd输入python 脚本目录4.2源码备份from aliyunsdkcore.client import AcsClient from aliyunsdkcore.acs_exception.exceptions import ClientException from aliyunsdkcore.acs_exception.exceptions import ServerException from aliyunsdkalidns.request.v20150109.DescribeSubDomainRecordsRequest import DescribeSubDomainRecordsRequest from aliyunsdkalidns.request.v20150109.DescribeDomainRecordsRequest import DescribeDomainRecordsRequest import requests from urllib.request import urlopen import json ipv4_flag = 1 # 是否开启ipv4 ddns解析,1为开启,0为关闭 ipv6_flag = 1 # 是否开启ipv6 ddns解析,1为开启,0为关闭 accessKeyId = "accessKeyId" # 将accessKeyId改成自己的accessKeyId accessSecret = "accessSecret" # 将accessSecret改成自己的accessSecret domain = "4v7p.top" # 你的主域名 name_ipv4 = "ipv4.test" # 要进行ipv4 ddns解析的子域名 name_ipv6 = "ipv6.test" # 要进行ipv6 ddns解析的子域名 client = AcsClient(accessKeyId, accessSecret, 'cn-hangzhou') def update(RecordId, RR, Type, Value): # 修改域名解析记录 from aliyunsdkalidns.request.v20150109.UpdateDomainRecordRequest import UpdateDomainRecordRequest request = UpdateDomainRecordRequest() request.set_accept_format('json') request.set_RecordId(RecordId) request.set_RR(RR) request.set_Type(Type) request.set_Value(Value) response = client.do_action_with_exception(request) def add(DomainName, RR, Type, Value): # 添加新的域名解析记录 from aliyunsdkalidns.request.v20150109.AddDomainRecordRequest import AddDomainRecordRequest request = AddDomainRecordRequest() request.set_accept_format('json') request.set_DomainName(DomainName) request.set_RR(RR) # https://blog.zeruns.tech request.set_Type(Type) request.set_Value(Value) response = client.do_action_with_exception(request) if ipv4_flag == 1: request = DescribeSubDomainRecordsRequest() request.set_accept_format('json') request.set_DomainName(domain) request.set_SubDomain(name_ipv4 + '.' + domain) response = client.do_action_with_exception(request) # 获取域名解析记录列表 domain_list = json.loads(response) # 将返回的JSON数据转化为Python能识别的 ip = urlopen('https://api-ipv4.ip.sb/ip').read() # 使用IP.SB的接口获取ipv4地址 ipv4 = str(ip, encoding='utf-8') print("获取到IPv4地址:%s" % ipv4) if domain_list['TotalCount'] == 0: add(domain, name_ipv4, "A", ipv4) print("新建域名解析成功") elif domain_list['TotalCount'] == 1: if domain_list['DomainRecords']['Record'][0]['Value'].strip() != ipv4.strip(): update(domain_list['DomainRecords']['Record'][0]['RecordId'], name_ipv4, "A", ipv4) print("修改域名解析成功") else: # https://blog.zeruns.tech print("IPv4地址没变") elif domain_list['TotalCount'] > 1: from aliyunsdkalidns.request.v20150109.DeleteSubDomainRecordsRequest import DeleteSubDomainRecordsRequest request = DeleteSubDomainRecordsRequest() request.set_accept_format('json') request.set_DomainName(domain) # https://blog.zeruns.tech request.set_RR(name_ipv4) response = client.do_action_with_exception(request) add(domain, name_ipv4, "A", ipv4) print("修改域名解析成功") print("本程序版权属于zeruns,博客:https://blog.zeruns.tech") if ipv6_flag == 1: request = DescribeSubDomainRecordsRequest() request.set_accept_format('json') request.set_DomainName(domain) request.set_SubDomain(name_ipv6 + '.' + domain) response = client.do_action_with_exception(request) # 获取域名解析记录列表 domain_list = json.loads(response) # 将返回的JSON数据转化为Python能识别的 ip = urlopen('https://api-ipv6.ip.sb/ip').read() # 使用IP.SB的接口获取ipv6地址 ipv6 = str(ip, encoding='utf-8') print("获取到IPv6地址:%s" % ipv6) if domain_list['TotalCount'] == 0: add(domain, name_ipv6, "AAAA", ipv6) print("新建域名解析成功") elif domain_list['TotalCount'] == 1: if domain_list['DomainRecords']['Record'][0]['Value'].strip() != ipv6.strip(): update(domain_list['DomainRecords']['Record'][0]['RecordId'], name_ipv6, "AAAA", ipv6) print("修改域名解析成功") else: # https://blog.zeruns.tech print("IPv6地址没变") elif domain_list['TotalCount'] > 1: from aliyunsdkalidns.request.v20150109.DeleteSubDomainRecordsRequest import DeleteSubDomainRecordsRequest request = DeleteSubDomainRecordsRequest() request.set_accept_format('json') request.set_DomainName(domain) request.set_RR(name_ipv6) # https://blog.zeruns.tech response = client.do_action_with_exception(request) add(domain, name_ipv6, "AAAA", ipv6) print("修改域名解析成功")6.设置定时任务6.1 Linux操作使用crontab -e写入定时任务即可。例如:6.2 windows操作右键点击电脑左下角,再点击计算机管理点击任务计划程序,再点击创建任务,输入要设置的任务名称。新建触发器,执行间隔可以自己设置,持续时间改成无限期。新建操作,这一步很重要,配置错误就会导致脚本文件执行不成功!!!最后确认就行。参考资料Python实现阿里云域名DDNS支持ipv4和ipv6:https://developer.aliyun.com/article/755182
2021年02月20日
828 阅读
0 评论
0 点赞
2021-02-19
IOU计算
IOU计算&判断两个矩形相交以及求出相交的区域求解图示一$$ IOU=area/(area1+area2-area) $$求解图示二理论分析-判断两个矩形相交以及求出相交的区域问题:给定两个矩形A和B,矩形A的左上角坐标为(Xa1,Ya1),右下角坐标为(Xa2,Ya2),矩形B的左上角坐标为(Xb1,Yb1),右下角 坐标为(Xb2,Yb2)。1.设计一个算法,确定两个矩形是否相交(即有重叠区域)对于这个问题,一般的思路就是判断一个矩形的四个顶点是否在另一个矩形的区域内。这个思路最简单,但是效率不高,并且存在错误,错误在哪里,下面分析一 下。 如上图,把矩形的相交(区域重叠)分成三种(可能也有其他划分),对于第三种情况,如图中的(3),两个矩形相交,但并不存在一个矩形的顶点在另一个矩形 内部。所以那种思路存在一个错误,对于这种情况的相交则检查不出。仔细观察上图,想到另一种思路,那就是判断两个矩形的中心坐标的水平和垂直距离,只要这两个值满足某种条件就可以相交。矩形A的宽 Wa = Xa2-Xa1 高 Ha = Ya2-Ya1矩形B的宽 Wb = Xb2-Xb1 高 Hb = Yb2-Yb1矩形A的中心坐标 (Xa3,Ya3) = ( (Xa2+Xa1)/2 ,(Ya2+Ya1)/2 )矩形B的中心坐标 (Xb3,Yb3) = ( (Xb2+Xb1)/2 ,(Yb2+Yb1)/2 )所以只要同时满足下面两个式子,就可以说明两个矩形相交。1) | Xb3-Xa3 | <= Wa/2 + Wb/22) | Yb3-Ya3 | <= Ha/2 + Hb/2即:| Xb2+Xb1-Xa2-Xa1 | <= Xa2-Xa1 + Xb2-Xb1| Yb2+Yb1-Ya2-Ya1 | <=Y a2-Ya1 + Yb2-Yb12.如果两个矩形相交,设计一个算法,求出相交的区域矩形Xc1 = max(Xa1,Xb1)Yc1 = max(Ya1,Yb1)Xc2 = min(Xa2,Xb2)Yc2 = min(Ya2,Yb2)这样就求出了矩形的相交区域。另外,注意到在不假设矩形相交的前提下,定义(Xc1,Yc1),(Xc2,Yc2),且Xc1,Yc1,Xc2,Yc2的值由上面四个式子得出。这样, 可以依据Xc1,Yc1,Xc2,Yc2的值来判断矩形相交。Xc1,Yc1,Xc2,Yc2只要同时满足下面两个式子,就可以说明两个矩形相交。3) Xc1 <= Xc24) Yc1 <= Yc2即:max(Xa1,Xb1) <= min(Xa2,Xb2)max(Ya1,Yb1) <= min(Ya2,Yb2)代码实现代码""" IOU计算 + input + box1:[box1_x1,box1_y1,box1_x2,box1_y2] + box2:[box2_x1,box2_y1,box2_x2,box2_y2] + output + iou值 """ def cal_iou(box1,box2): # 判断是否能相交 if abs(box2[2]+box2[0]-box1[2]-box1[0])>box2[2]-box2[0]+box1[2]-box1[0]: return 0 if abs(box2[3]+box2[1]-box1[3]-box1[1])>box2[3]-box2[1]+box1[3]-box1[1]: return 0 # 求相交区域左上角的坐标和右下角的坐标 box_intersect_x1 = max(box1[0], box2[0]) box_intersect_y1 = max(box1[1], box2[1]) box_intersect_x2 = min(box1[2], box2[2]) box_intersect_y2 = min(box1[3], box2[3]) # 求二者相交的面积 area_intersect = (box_intersect_y2 - box_intersect_y1) * (box_intersect_x2 - box_intersect_x1) # 求box1,box2的面积 area_box1 = (box1[2] - box1[0]) * (box1[3] - box1[1]) area_box2 = (box2[2] - box2[0]) * (box2[3] - box2[1]) # 求二者相并的面积 area_union = area_box1 + area_box2 - area_intersect # 计算iou(交并比) iou = area_intersect / area_union return iou验证box1 = [0,0,500,500] box2 = [250,250,750,750] iou = cal_iou(box1,box2) print(iou)0.14285714285714285人为验证图示import matplotlib.pyplot as plt fig1 = plt.figure() ax1 = fig1.add_subplot(111, aspect='equal') ax1.add_patch(plt.Rectangle((0, 0),500,500,color='b',alpha=0.5)) ax1.add_patch(plt.Rectangle((250, 250),500,500,color='b',alpha=0.5)) ax1.add_patch(plt.Rectangle((250, 250),250,250,color='r',alpha=0.5)) plt.xlim(0, 750) plt.ylim(0, 750) plt.show()由图易知:area_box1= 250000area_box2= 250000area_intersect= 62500area_union= 437500因此:iou = 62500 / 437500 = 0.14285714285714285参考资料yolo 算法中的IOU算法程序与原理解读:https://blog.csdn.net/caokaifa/article/details/80724842IOU的计算:https://www.cnblogs.com/darkknightzh/p/9043395.html判断两个矩形相交以及求出相交的区域:https://www.cnblogs.com/zhoug2020/p/7451340.html
2021年02月19日
1,099 阅读
0 评论
0 点赞
2021-02-18
深度学习常用数据集[转载]
深度学习常用数据集[转载]1、迁移学习(传统神经网络)1、猫狗数据集:链接:https://pan.baidu.com/s/1TqmdkJBY49ftg19tRK2Ngg 提取码: htxf2、目标检测1、VOC2007+2012训练集链接: https://pan.baidu.com/s/1u4YUyWJqs5bD38A6Hvs-og 提取码: xzde3、实例分割1、shape数据集(圆形、三角形、正方形):链接: https://pan.baidu.com/s/14dBd1Lbjw0FCnwKryf9taQ 提取码: 94574、语义分割(旧版)1、斑马线数据集:链接:https://pan.baidu.com/s/1uzwqLaCXcWe06xEXk1ROWw 提取码:pp6w2、VOC数据集:链接:https://pan.baidu.com/s/1Urh9W7XPNMF8yR67SDjAAQ 提取码: cvy25、语义分割(新版)1、VOC拓展数据集及其验证集链接: https://pan.baidu.com/s/1BrR7AUM1XJvPWjKMIy2uEw 提取码: vszf6、人脸识别人脸识别数据集包含在对应权值的百度网盘里。1、retinaface链接: https://pan.baidu.com/s/1t7-BNsZzHj2isCekc_PVtw 提取码: 2qrs2、retinaface-pytorch链接: https://pan.baidu.com/s/1q2E6uWs0R5GU_PFs9_vglg 提取码: z7es参考资料1.神经网络学习小记录44——训练资源汇总贴:https://blog.csdn.net/weixin_44791964/article/details/105123842
2021年02月18日
912 阅读
0 评论
0 点赞
2021-02-18
YOLOv3学习:(三)模型输出解码
YOLOv3学习:(三)模型输出解码YOLOv3 模型输出输出模型输出解码-理论(以13*13为例)解码目标模型输出shape:[batch_size, 255, 13, 13] 255 = 3(先验框数量)*(x_offset+y_offset+w_scale+h_scale+有无物体置信度+类别置信度)即原模型将图像分割为13*13的小块进行预测,每个小块负责根据先验框预测3个框,每个预测框以小格的左上角为基准点,以先验框的w和h为基准。$$ 预测框w=先验框w \times e^{w\_scale} $$$$ 预测框h=先验框h \times e^{h\_scale} $$模型输出解码的目标即为将输出结果的x_offset+y_offset+w_scale+h_scale部分进行校正,变成以整个图片的最左上角(0,0)点为基准点,并对每个预测框的w,h根据先验框进行对应校正。最终的到3*13*13个预测框。即解码输出shape:[batch_size, 3*13*13,85] 85=x_offset+y_offset+w_scale+h_scale+有无物体置信度+类别置信度模型输出解码-代码# YOLOv3 超参数 from easydict import EasyDict super_param = \ { "anchors": [[[116, 90], [156, 198], [373, 326]], [[30, 61], [62, 45], [59, 119]], [[10, 13], [16, 30], [33, 23]]], "num_classes": 80, "img_size":(416,416), } super_param = EasyDict(super_param) print(super_param.img_size) # YOLOv3模型输出结果解码器 """ 模型输出结果解释: 以[batch_size, 255, 13, 13]为例 255 = 3(先验框数量)*(x_offset+y_offset+w+h+有无物体置信度+类别置信度) 代表将原图划分为13*13 然后每个小框负责预测3个框 每个框的中心点为(框的左上角x+x_offset,框的左上角y+y_offset) 每个框的w和h为 torch.exp(w.data) * anchor_w 和torch.exp(h.data) * anchor_h 解码输出结果解释: 实例对应输出shape为[batch_size,3*13*13,85],即共预测了3*13*13个boxm 每个box的具体参数为(x+y+w+h+有无物体置信度+80个类别置信度)共85个 """ class DecodeBox(nn.Module): def __init__(self, anchors = super_param.anchors[0], num_classes = super_param.num_classes, img_size = super_param.img_size): super(DecodeBox, self).__init__() self.anchors = anchors self.num_anchors = len(anchors) self.num_classes = num_classes self.img_size = img_size def forward(self, input): # 获取YOLOv3单路输出的结果shape信息 batch_size,input_height,input_width = input.size(0),input.size(2),input.size(3) # 计算步长 stride_h,stride_w = self.img_size[1] / input_height,self.img_size[0] / input_width # 把把先验框归一到特征层上 eg:[116, 90], [156, 198], [373, 326] --》[116/32, 90/32], [156/32, 198/32], [373/32, 326/32] scaled_anchors = [(anchor_width / stride_w, anchor_height / stride_h) for anchor_width, anchor_height in self.anchors] # 对预测结果进行reshape # eg:[batch_size, 255, 13, 13] -->[batch_size,num_anchors,input_height,input_width,5 + num_classes](batch_size,3,13,13,85) # 维度中的85包含了4+1+80,分别代表x_offset、y_offset、h和w、置信度、分类结果。 prediction = input.view(batch_size, self.num_anchors, 5 + self.num_classes, input_height, input_width).permute(0, 1, 3, 4, 2).contiguous() # 先验框的中心位置的调整参数 x_offset,y_offset = torch.sigmoid(prediction[..., 0]),torch.sigmoid(prediction[..., 1]) # 先验框的宽高调整参数 w,h = prediction[..., 2],prediction[..., 3] # Width.Height # 获得置信度,是否有物体 conf = torch.sigmoid(prediction[..., 4]) # 种类置信度 pred_cls = torch.sigmoid(prediction[..., 5:]) # Cls pred. FloatTensor = torch.cuda.FloatTensor if x_offset.is_cuda else torch.FloatTensor LongTensor = torch.cuda.LongTensor if x_offset.is_cuda else torch.LongTensor # 生成网格,先验框中心,网格左上角 grid_x = torch.linspace(0, input_width - 1, input_width).repeat(input_width, 1).repeat( batch_size * self.num_anchors, 1, 1).view(x_offset.shape).type(FloatTensor) grid_y = torch.linspace(0, input_height - 1, input_height).repeat(input_height, 1).t().repeat( batch_size * self.num_anchors, 1, 1).view(y_offset.shape).type(FloatTensor) # 生成先验框的宽高 anchor_w = FloatTensor(scaled_anchors).index_select(1, LongTensor([0])) anchor_h = FloatTensor(scaled_anchors).index_select(1, LongTensor([1])) anchor_w = anchor_w.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(w.shape) anchor_h = anchor_h.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(h.shape) # 计算调整后的先验框中心与宽高 pred_boxes = FloatTensor(prediction[..., :4].shape) pred_boxes[..., 0] = x_offset.data + grid_x pred_boxes[..., 1] = y_offset.data + grid_y pred_boxes[..., 2] = torch.exp(w.data) * anchor_w pred_boxes[..., 3] = torch.exp(h.data) * anchor_h # 用于将输出调整为相对于416x416的大小 _scale = torch.Tensor([stride_w, stride_h] * 2).type(FloatTensor) output = torch.cat((pred_boxes.view(batch_size, -1, 4) * _scale, conf.view(batch_size, -1, 1), pred_cls.view(batch_size, -1, self.num_classes)), -1) return output.data测试fake_out1 = torch.zeros((1,255,13,13)) print(fake_out1.shape) decoder = DecodeBox() out1_decode = decoder(fake_out1) print(out1_decode.shape)torch.Size([1, 255, 13, 13]) torch.Size([1, 507, 85])参考资料Pytorch 搭建自己的YOLO3目标检测平台(Bubbliiiing 深度学习 教程):https://www.bilibili.com/video/BV1Hp4y1y788?p=11&spm_id_from=pageDriver
2021年02月18日
807 阅读
0 评论
0 点赞
1
...
18
19
20
...
24