Heary's Blog

CentOS yum Error: Failed to download metadata for repo 'appstream'问题解决

2022-03-13T11:03:52.000Z

本文介绍CentOS 8在使用yum update更新时，遭遇报错：Error:Failed to download metadata for repo 'appstream': Cannot prepareinternal mirrorlist: No URLs in mirrorlist 的解决方法。

CentOSyum Error: Failed to download metadata for repo 'appstream'问题解决

1 问题描述

在CentOS8中，执行yum update更新时，遭遇报错：

1	Error: Failed to download metadata for repo 'appstream': Cannot prepare internal mirrorlist: No URLs in mirrorlist

根据提示，显示mirror list中找不到可用的URL，导致无法获取appstreamrepo的元信息。

2 解决方法

2.1 相关资料中的解决方法

我查到的资料中，已有的解决方案是通过sed工具批量查询和替换/etc/yum.repos.d/中的软件仓库配置信息。

引用自：https://www.cnblogs.com/EthanWong/p/15932675.html

# 进入yum.repos.d 目录下
cd /etc/yum.repos.d/
# 修改源链接
sed -i 's/mirrorlist/#mirrorlist/g' /etc/yum.repos.d/CentOS-*
# 要将之前的mirror.centos.org 改成 vault.centos.org
sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-*

上述文章指出，因为CentOs Linux 8 从 2021.10.31号后已经停止维护，所以之后更新镜像需要通过vault.centos.org来获取更新。相应的，文中给出的方案就是将/etc/yum.repos.d/中，各个软件仓库配置文件中的mirrorlist字段都注释掉，不采用镜像列表，而是启用baseurl去连接源站，且原站修改为http://vault.centos.org。

2.2 本文的解决方法

我认为上述修改软件配置的方案也许可行，但问题的实质是因为：RHEL修改了CentOS的开源方案，将CentOS改为CentOSStream的形式进行后续迭代。过去的CentOS与RHEL共享核心代码，只是RHEL额外具备一些增值软件和服务。目前的CentOSStream将作为RHEL的“开发版”，即：开发的新代码先发布到CentOSStream上进行验证，再合入RHEL。通过Fedora Linux, CentOSStream这依次两道试验阶段，再合入RHEL，以保障RHEL的高可靠。

CentOS Stream
Continuously delivered distro that tracks just ahead of Red HatEnterprise Linux (RHEL) development, positioned as a midstream betweenFedora Linux and RHEL. For anyone interested in participating andcollaborating in the RHEL ecosystem, CentOS Stream is your reliableplatform for innovation.

CentOS Linux 8因为这项变更，EOL被改为2021年底，因此2022年开始，CentOSLinux 8的mirrorlist下架，代码停止维护。

因此，该问题解决方法的最佳实践应当是根据官方文档，将已经EOL的CentOSLinux 8迁移到CentOS Stream8的发行分支上。（或采用其他Linux，如：Debian）。

https://www.centos.org/centos-stream/

1 2	dnf --disablerepo '*' --enablerepo extras swap centos-linux-repos centos-stream-repos dnf distro-sync

dnf是CentOS8默认的包管理器。按照上述命令，通过dnf重新配置软件repo并同步数据，即可迁移到CentOSStream发行分支上。

DNF stands for Dandified YUM is a software packagemanager for RPM-based Linux distributions. It is used to install, updateand remove packages in the CentOS operating system. It is the defaultpackage manager of CentOS8.

迁移后的CentOS Stream 8的EOL为May 31st, 2024。

slice.go - 理解Go的切片容器

2021-11-01T14:44:43.000Z

阅读Go源码，理解内置切片（slice）容器的数据结构与算法原理。

slice.go - 理解Go的切片容器

Slice的实现位于go.go，总共仅318行。

本文以目前Go源码最新的1.17.2版本为例。

数据结构

type slice struct {
array unsafe.Pointer
len   int
cap   int
}

slice的数据结构并不复杂，本质上是对array的一层封装，类似Java中的ArrayList。

slice底层数据由array存储，由len标记当前实际存储的元素数量，cap标记当前array指针指向的内存对象的元素容量。

算法

构造（makeslice）

func makeslice(et *_type, len, cap int) unsafe.Pointer {
mem, overflow := math.MulUintptr(et.size, uintptr(cap))
if overflow || mem > maxAlloc || len < 0 || len > cap {
// NOTE: Produce a 'len out of range' error instead of a
// 'cap out of range' error when someone does make([]T, bignumber).
// 'cap out of range' is true too, but since the cap is only being
// supplied implicitly, saying len is clearer.
// See golang.org/issue/4085.
mem, overflow := math.MulUintptr(et.size, uintptr(len))
if overflow || mem > maxAlloc || len < 0 {
panicmakeslicelen()
}
panicmakeslicecap()
}

return mallocgc(mem, et, true)
}

构造过程输入et，即ElementType的缩写，用于记录slice中存储的元素类型、

首先，通过math.MulUintptr函数实现带溢出检测的uintptr类型乘法。

https://pkg.go.dev/runtime/internal/math#MulUintptr
https://cs.opensource.google/go/go/+/go1.17.2:src/runtime/internal/math/math.go;l=13
math.MulUintptr函数的实现挺巧妙的，此处暂不深究

随后，根据计算出的内存长度，通过mallocgc函数（位于go.go中，基于TCMalloc机制实现）分配相应的内存对象。

扩容（growslice）

slice能够在append时自动扩容。

// growslice handles slice growth during append.
// It is passed the slice element type, the old slice, and the desired new minimum capacity,
// and it returns a new slice with at least that capacity, with the old data
// copied into it.
// The new slice's length is set to the old slice's length,
// NOT to the new requested capacity.
// This is for codegen convenience. The old slice's length is used immediately
// to calculate where to write new values during an append.
// TODO: When the old backend is gone, reconsider this decision.
// The SSA backend might prefer the new length or to return only ptr/cap and save stack space.
func growslice(et *_type, old slice, cap int) slice {
if raceenabled {
callerpc := getcallerpc()
racereadrangepc(old.array, uintptr(old.len*int(et.size)), callerpc, funcPC(growslice))
}
if msanenabled {
msanread(old.array, uintptr(old.len*int(et.size)))
}

if cap < old.cap {
panic(errorString("growslice: cap out of range"))
}

if et.size == 0 {
// append should not create a slice with nil pointer but non-zero len.
// We assume that append doesn't need to preserve old.array in this case.
return slice{unsafe.Pointer(&zerobase), old.len, cap}
}

newcap := old.cap
doublecap := newcap + newcap
if cap > doublecap {
newcap = cap
} else {
if old.cap < 1024 {
newcap = doublecap
} else {
// Check 0 < newcap to detect overflow
// and prevent an infinite loop.
for 0 < newcap && newcap < cap {
newcap += newcap / 4
}
// Set newcap to the requested cap when
// the newcap calculation overflowed.
if newcap <= 0 {
newcap = cap
}
}
}

var overflow bool
var lenmem, newlenmem, capmem uintptr
// Specialize for common values of et.size.
// For 1 we don't need any division/multiplication.
// For sys.PtrSize, compiler will optimize division/multiplication into a shift by a constant.
// For powers of 2, use a variable shift.
switch {
case et.size == 1:
lenmem = uintptr(old.len)
newlenmem = uintptr(cap)
capmem = roundupsize(uintptr(newcap))
overflow = uintptr(newcap) > maxAlloc
newcap = int(capmem)
case et.size == sys.PtrSize:
lenmem = uintptr(old.len) * sys.PtrSize
newlenmem = uintptr(cap) * sys.PtrSize
capmem = roundupsize(uintptr(newcap) * sys.PtrSize)
overflow = uintptr(newcap) > maxAlloc/sys.PtrSize
newcap = int(capmem / sys.PtrSize)
case isPowerOfTwo(et.size):
var shift uintptr
if sys.PtrSize == 8 {
// Mask shift for better code generation.
shift = uintptr(sys.Ctz64(uint64(et.size))) & 63
} else {
shift = uintptr(sys.Ctz32(uint32(et.size))) & 31
}
lenmem = uintptr(old.len) << shift
newlenmem = uintptr(cap) << shift
capmem = roundupsize(uintptr(newcap) << shift)
overflow = uintptr(newcap) > (maxAlloc >> shift)
newcap = int(capmem >> shift)
default:
lenmem = uintptr(old.len) * et.size
newlenmem = uintptr(cap) * et.size
capmem, overflow = math.MulUintptr(et.size, uintptr(newcap))
capmem = roundupsize(capmem)
newcap = int(capmem / et.size)
}

// The check of overflow in addition to capmem > maxAlloc is needed
// to prevent an overflow which can be used to trigger a segfault
// on 32bit architectures with this example program:
//
// type T [1<<27 + 1]int64
//
// var d T
// var s []T
//
// func main() {
//   s = append(s, d, d, d, d)
//   print(len(s), "\n")
// }
if overflow || capmem > maxAlloc {
panic(errorString("growslice: cap out of range"))
}

var p unsafe.Pointer
if et.ptrdata == 0 {
p = mallocgc(capmem, nil, false)
// The append() that calls growslice is going to overwrite from old.len to cap (which will be the new length).
// Only clear the part that will not be overwritten.
memclrNoHeapPointers(add(p, newlenmem), capmem-newlenmem)
} else {
// Note: can't use rawmem (which avoids zeroing of memory), because then GC can scan uninitialized memory.
p = mallocgc(capmem, et, true)
if lenmem > 0 && writeBarrier.enabled {
// Only shade the pointers in old.array since we know the destination slice p
// only contains nil pointers because it has been cleared during alloc.
bulkBarrierPreWriteSrcOnly(uintptr(p), uintptr(old.array), lenmem-et.size+et.ptrdata)
}
}
memmove(p, old.array, lenmem)

return slice{p, old.len, newcap}
}

在扩容时，如果新容量已经超过现有容量的两倍，则以更大的新容量为准。

如果指定的新容量不足两倍，则分两种情况：

如果现有容量较小（<1024），那就直接容量翻倍2x（直接成倍增长的策略有助于避免频繁扩容，而容量较小时，即使有空间冗余浪费，也是比较少的）；
如果现有容量不小了（>=1024），此时翻倍式扩容可能会浪费较多的内存，因此以1.25x渐进式增长至不低于目标容量，既满足目标容量，又避免浪费内存。

runtime:make slice growth formula a bit smoother
不过值得注意的是，这样的扩容算法未必是最优的，仍然存在改进的研究空间。从master分支上最新commit中可以看到，新的commit正在尝试更平滑的扩容函数（及参数）。高的增长倍率，一方面有助于避免频繁扩容（避免分配内存时潜在的系统调用代价），另一方面也更容易造成内存冗余。

此后，计算新slice的array所需的内存容量capmen和相应的元素容量newcap。（该计算过程针对元素尺寸做了优化）

最后，通过mallocgc函数申请capmem尺寸的内存对象，并且用memmove函数将原slice数据拷贝到新slice的内存（指针p）中。

从源码理解Gin框架原理

2021-08-22T13:04:24.000Z

Gin是一款高性能的Go语言Web框架，本文以一个小型示例项目为例，从源码解读Gin的服务启动过程、请求与响应过程的技术原理。

从源码理解Gin框架原理

1 概述

Gin WebFramework
Gin is a web framework written in Go (Golang). It features amartini-like API with performance that is up to 40 times faster thanksto httprouter.If you need performance and good productivity, you will love Gin.

Gin是一款高性能的Go语言Web框架。

LearnGin

LearnGin仓库存储本文的示例代码。

本文所使用的软件版本是：

Gin的版本是：gin@v1.7.4；
Go的版本是：1.17。

2 技术原理

2.1 Gin的启动过程

2.1.1 项目的main函数

主函数位于项目根目录下的main.go中，代码如下：

package main

import (
"github.com/LearnGin/handler"
"github.com/LearnGin/middleware"
"github.com/gin-gonic/gin"
)

func main() {
// init gin with default configs
r := gin.Default()

// append custom middle-wares
middleware.RegisterMiddleware(r)
// register custom routers
handler.RegisterHandler(r)

// run the engine
r.Run()
}

主要步骤：

初始化Gin：gin.Default()执行Gin的初始化过程，默认的初始化包含两个中间件，
1. Logger：日志中间件，将Gin的启动与响应日志输出到控制台；
2. Recovery：恢复中间件，将Gin遇到的无法处理的请求按HTTP500状态码返回。
注册中间件：本例的middleware.RegisterMiddleware(r)用于将项目中开发的中间件注册到GinEngine上；
注册事件处理：本例的handler.RegisterHandler(r)用于将项目中开发的对应于指定URL的事件处理函数注册到GinEngine上；
启动Gin：r.Run()负责启动GinEngine，开始监听请求并提供HTTP服务。

2.1.2 初始化Gin

gin的Default函数

// Default returns an Engine instance with the Logger and Recovery middleware already attached.
func Default() *Engine {
debugPrintWARNINGDefault()
engine := New()
engine.Use(Logger(), Recovery())
return engine
}

Gin的默认初始化主要是创建Engine和注册默认的两款中间件。

2.1.3 注册中间件

package middleware

import (
"github.com/LearnGin/middleware/debug"
"github.com/gin-gonic/gin"
)

func RegisterMiddleware(r *gin.Engine) {
r.Use(debug.DebugMiddleWare())
}

gin.Engine的r.Use函数负责将gin.HandleFunc类型函数注册为中间件。此处的debug.DebugMiddleWare()是本例开发的一个简易的自定义中间件，用于在实际的事件处理前，输出详细的请求信息；在实际的事件处理后，输出结果状态码。

Engine.Use函数

Engine.Use函数用于将中间件添加到当前的路由上，位于gin.go中，代码如下：

// Use attaches a global middleware to the router. ie. the middleware attached though Use() will be
// included in the handlers chain for every single request. Even 404, 405, static files...
// For example, this is the right place for a logger or error management middleware.
func (engine *Engine) Use(middleware ...HandlerFunc) IRoutes {
engine.RouterGroup.Use(middleware...)
engine.rebuild404Handlers()
engine.rebuild405Handlers()
return engine
}

RouterGroup.Use函数

实际上，还需要进一步调用engine.RouterGroup.Use(middleware...)完成实际的中间件注册工作，该函数位于gin.go中，代码如下：

// Use adds middleware to the group, see example code in GitHub.
func (group *RouterGroup) Use(middleware ...HandlerFunc) IRoutes {
group.Handlers = append(group.Handlers, middleware...)
return group.returnObj()
}

该函数也很简短，实际上就是把中间件（本质是一个函数）添加到HandlersChain类型（实质上为数组type HandlersChain []HandlerFunc）的group.Handlers中。换句话说，实际上是以函数数组的形式收集了一个有序的函数序列。

此后会介绍中间件中每次都会出现的c.Next()函数如何基于该数组进行流程控制。

2.1.4 注册事件处理

package handler

import (
"github.com/LearnGin/handler/person"
"github.com/gin-gonic/gin"
)

func RegisterHandler(r *gin.Engine) {
r.Handle("GET", "/ping", PingHandler())
r.Handle("POST", "/person/create", person.CreatePersonHandler())
}

gin.Engine的r.Handle函数用于将事件处理函数注册到指定的HTTP方法+相对路径上。

RouterGroup.Handle函数

// Handle registers a new request handle and middleware with the given path and method.
// The last handler should be the real handler, the other ones should be middleware that can and should be shared among different routes.
// See the example code in GitHub.
//
// For GET, POST, PUT, PATCH and DELETE requests the respective shortcut
// functions can be used.
//
// This function is intended for bulk loading and to allow the usage of less
// frequently used, non-standardized or custom methods (e.g. for internal
// communication with a proxy).
func (group *RouterGroup) Handle(httpMethod, relativePath string, handlers ...HandlerFunc) IRoutes {
if matches, err := regexp.MatchString("^[A-Z]+$", httpMethod); !matches || err != nil {
panic("http method " + httpMethod + " is not valid")
}
return group.handle(httpMethod, relativePath, handlers)
}

GinEngine的Handle函数调用实际上调用的是内部匿名属性RouterGroup的Handle函数。该函数的逻辑由handle函数进一步处理，代码为：

func (group *RouterGroup) handle(httpMethod, relativePath string, handlers HandlersChain) IRoutes {
absolutePath := group.calculateAbsolutePath(relativePath)
handlers = group.combineHandlers(handlers)
group.engine.addRoute(httpMethod, absolutePath, handlers)
return group.returnObj()
}

可以看到，实际上handler是由group.engine.addRoute(httpMethod, absolutePath, handlers)注册路由的。

Engine.addRoute函数

如果持续追查下去，会发现addRoute函数实际上是将该方法添加到当前HTTP方法对应的那颗路由树中。


func (engine *Engine) addRoute(method, path string, handlers HandlersChain) {
assert1(path[0] == '/', "path must begin with '/'")
assert1(method != "", "HTTP method can not be empty")
assert1(len(handlers) > 0, "there must be at least one handler")

debugPrintRoute(method, path, handlers)

root := engine.trees.get(method)
if root == nil {
root = new(node)
root.fullPath = "/"
engine.trees = append(engine.trees, methodTree{method: method, root: root})
}
root.addRoute(path, handlers)

// Update maxParams
if paramsCount := countParams(path); paramsCount > engine.maxParams {
engine.maxParams = paramsCount
}
}

每个HTTP方法（如：GET，POST）的路由信息都各自由一个树结构来维护，该树结构的模型与函数实现位于gin/tree.go中，此处不再继续展开。

2.1.5 启动Gin

Engine.Run函数

// Run attaches the router to a http.Server and starts listening and serving HTTP requests.
// It is a shortcut for http.ListenAndServe(addr, router)
// Note: this method will block the calling goroutine indefinitely unless an error happens.
func (engine *Engine) Run(addr ...string) (err error) {
defer func() { debugPrintError(err) }()

trustedCIDRs, err := engine.prepareTrustedCIDRs()
if err != nil {
return err
}
engine.trustedCIDRs = trustedCIDRs
address := resolveAddress(addr)
debugPrint("Listening and serving HTTP on %s\n", address)
err = http.ListenAndServe(address, engine)
return
}

可以看到，Engine.Run函数主要是：

解析监听地址传参；
启动监听与服务。

其中，最核心的监听与服务实质上是调用Go语言内置库net/http的http.ListenAndServe函数实现的。

net/http的ListenAndServe函数

Gin框架网络编程的底层实际上是基于Go语言的内置net/http网络库实现的。

// ListenAndServe listens on the TCP network address addr and then calls
// Serve with handler to handle requests on incoming connections.
// Accepted connections are configured to enable TCP keep-alives.
//
// The handler is typically nil, in which case the DefaultServeMux is used.
//
// ListenAndServe always returns a non-nil error.
func ListenAndServe(addr string, handler Handler) error {
server := &Server{Addr: addr, Handler: handler}
return server.ListenAndServe()
}

该函数实例化Sever，并调用其ListenAndServe函数实现监听与服务功能。

注意：此时，输入的GinEngine对象以Handler接口的对象的形式被传入给了net/http库的Server对象，作为后续Serve对象处理网络请求时调用的函数。

net/http的Handler接口

net/http的Server结构体类型中有一个Handler接口类型的Handler。

// A Server defines parameters for running an HTTP server.
// The zero value for Server is a valid configuration.
type Server struct {
// Addr optionally specifies the TCP address for the server to listen on,
// in the form "host:port". If empty, ":http" (port 80) is used.
// The service names are defined in RFC 6335 and assigned by IANA.
// See net.Dial for details of the address format.
Addr string

Handler Handler // handler to invoke, http.DefaultServeMux if nil
    
    // ...
}

而该Handler接口的唯一特点就是有且仅有一个ServeHTTP函数声明，该接口定义代码如下：

// A Handler responds to an HTTP request.
//
// ServeHTTP should write reply headers and data to the ResponseWriter
// and then return. Returning signals that the request is finished; it
// is not valid to use the ResponseWriter or read from the
// Request.Body after or concurrently with the completion of the
// ServeHTTP call.
//
// Depending on the HTTP client software, HTTP protocol version, and
// any intermediaries between the client and the Go server, it may not
// be possible to read from the Request.Body after writing to the
// ResponseWriter. Cautious handlers should read the Request.Body
// first, and then reply.
//
// Except for reading the body, handlers should not modify the
// provided Request.
//
// If ServeHTTP panics, the server (the caller of ServeHTTP) assumes
// that the effect of the panic was isolated to the active request.
// It recovers the panic, logs a stack trace to the server error log,
// and either closes the network connection or sends an HTTP/2
// RST_STREAM, depending on the HTTP protocol. To abort a handler so
// the client sees an interrupted response but the server doesn't log
// an error, panic with the value ErrAbortHandler.
type Handler interface {
ServeHTTP(ResponseWriter, *Request)
}

Handler接口的意义就在于，任何类型，只需要实现了该ServeHTTP函数，就实现了Handler接口，就可以用作Server的Handler，供HTTP处理时调用。

显然，gin.Engine实现了net/http的Handler接口的ServeHTTP函数（gin/gin.go）。具体的实现原理在接下来介绍。

2.2 请求与响应过程

2.2.1 监听与接受请求

net/http的Server.ListenAndServe函数

上文介绍到，gin实际上调用了net/http的ListenAndServe函数实现网络监听与处理，具体由Server.ListenAndServe实现，位于net/http/server.go中，代码如下：

// ListenAndServe listens on the TCP network address srv.Addr and then
// calls Serve to handle requests on incoming connections.
// Accepted connections are configured to enable TCP keep-alives.
//
// If srv.Addr is blank, ":http" is used.
//
// ListenAndServe always returns a non-nil error. After Shutdown or Close,
// the returned error is ErrServerClosed.
func (srv *Server) ListenAndServe() error {
if srv.shuttingDown() {
return ErrServerClosed
}
addr := srv.Addr
if addr == "" {
addr = ":http"
}
ln, err := net.Listen("tcp", addr)
if err != nil {
return err
}
return srv.Serve(ln)
}

可以看到，net/http的Server.ListenAndServe函数实际上主要完成两项工作：

设置监听：net.Listen("tcp", addr)负责设置监听地址；
接受并处理网络请求：srv.Serve(ln)负责在监听位置上接受网络请求，建立连接并做出响应。

net/http的Server.Serve函数

Server.Serve函数用于监听、接受和处理网络请求，代码如下：

// Serve accepts incoming connections on the Listener l, creating a
// new service goroutine for each. The service goroutines read requests and
// then call srv.Handler to reply to them.
//
// HTTP/2 support is only enabled if the Listener returns *tls.Conn
// connections and they were configured with "h2" in the TLS
// Config.NextProtos.
//
// Serve always returns a non-nil error and closes l.
// After Shutdown or Close, the returned error is ErrServerClosed.
func (srv *Server) Serve(l net.Listener) error {
if fn := testHookServerServe; fn != nil {
fn(srv, l) // call hook with unwrapped listener
}

origListener := l
l = &onceCloseListener{Listener: l}
defer l.Close()

if err := srv.setupHTTP2_Serve(); err != nil {
return err
}

if !srv.trackListener(&l, true) {
return ErrServerClosed
}
defer srv.trackListener(&l, false)

baseCtx := context.Background()
if srv.BaseContext != nil {
baseCtx = srv.BaseContext(origListener)
if baseCtx == nil {
panic("BaseContext returned a nil context")
}
}

var tempDelay time.Duration // how long to sleep on accept failure

ctx := context.WithValue(baseCtx, ServerContextKey, srv)
for {
rw, err := l.Accept()
if err != nil {
select {
case <-srv.getDoneChan():
return ErrServerClosed
default:
}
if ne, ok := err.(net.Error); ok && ne.Temporary() {
if tempDelay == 0 {
tempDelay = 5 * time.Millisecond
} else {
tempDelay *= 2
}
if max := 1 * time.Second; tempDelay > max {
tempDelay = max
}
srv.logf("http: Accept error: %v; retrying in %v", err, tempDelay)
time.Sleep(tempDelay)
continue
}
return err
}
connCtx := ctx
if cc := srv.ConnContext; cc != nil {
connCtx = cc(connCtx, rw)
if connCtx == nil {
panic("ConnContext returned nil")
}
}
tempDelay = 0
c := srv.newConn(rw)
c.setState(c.rwc, StateNew, runHooks) // before Serve can return
go c.serve(connCtx)
}
}

在Server.Serve函数的实现中，启动了一个无条件的for循环以便持续监听、接受和处理网络请求，主要流程为：

接受请求：l.Accept()调用在无请求时保持阻塞，直到接收到请求时，接受请求并返回建立的连接；
处理请求：启动一个goroutine，使用conn的serve函数进行处理（go c.serve(connCtx)）；

net/http的conn.serve函数

已接受的请求会建立连接，对连接的后续处理由conn.serve函数实现，该函数实现较长，代码如下：

// Serve a new connection.
func (c *conn) serve(ctx context.Context) {
c.remoteAddr = c.rwc.RemoteAddr().String()
ctx = context.WithValue(ctx, LocalAddrContextKey, c.rwc.LocalAddr())
defer func() {
if err := recover(); err != nil && err != ErrAbortHandler {
const size = 64 << 10
buf := make([]byte, size)
buf = buf[:runtime.Stack(buf, false)]
c.server.logf("http: panic serving %v: %v\n%s", c.remoteAddr, err, buf)
}
if !c.hijacked() {
c.close()
c.setState(c.rwc, StateClosed, runHooks)
}
}()

if tlsConn, ok := c.rwc.(*tls.Conn); ok {
if d := c.server.ReadTimeout; d > 0 {
c.rwc.SetReadDeadline(time.Now().Add(d))
}
if d := c.server.WriteTimeout; d > 0 {
c.rwc.SetWriteDeadline(time.Now().Add(d))
}
if err := tlsConn.HandshakeContext(ctx); err != nil {
// If the handshake failed due to the client not speaking
// TLS, assume they're speaking plaintext HTTP and write a
// 400 response on the TLS conn's underlying net.Conn.
if re, ok := err.(tls.RecordHeaderError); ok && re.Conn != nil && tlsRecordHeaderLooksLikeHTTP(re.RecordHeader) {
io.WriteString(re.Conn, "HTTP/1.0 400 Bad Request\r\n\r\nClient sent an HTTP request to an HTTPS server.\n")
re.Conn.Close()
return
}
c.server.logf("http: TLS handshake error from %s: %v", c.rwc.RemoteAddr(), err)
return
}
c.tlsState = new(tls.ConnectionState)
*c.tlsState = tlsConn.ConnectionState()
if proto := c.tlsState.NegotiatedProtocol; validNextProto(proto) {
if fn := c.server.TLSNextProto[proto]; fn != nil {
h := initALPNRequest{ctx, tlsConn, serverHandler{c.server}}
// Mark freshly created HTTP/2 as active and prevent any server state hooks
// from being run on these connections. This prevents closeIdleConns from
// closing such connections. See issue https://golang.org/issue/39776.
c.setState(c.rwc, StateActive, skipHooks)
fn(c.server, tlsConn, h)
}
return
}
}

// HTTP/1.x from here on.

ctx, cancelCtx := context.WithCancel(ctx)
c.cancelCtx = cancelCtx
defer cancelCtx()

c.r = &connReader{conn: c}
c.bufr = newBufioReader(c.r)
c.bufw = newBufioWriterSize(checkConnErrorWriter{c}, 4<<10)

for {
w, err := c.readRequest(ctx)
if c.r.remain != c.server.initialReadLimitSize() {
// If we read any bytes off the wire, we're active.
c.setState(c.rwc, StateActive, runHooks)
}
if err != nil {
const errorHeaders = "\r\nContent-Type: text/plain; charset=utf-8\r\nConnection: close\r\n\r\n"

switch {
case err == errTooLarge:
// Their HTTP client may or may not be
// able to read this if we're
// responding to them and hanging up
// while they're still writing their
// request. Undefined behavior.
const publicErr = "431 Request Header Fields Too Large"
fmt.Fprintf(c.rwc, "HTTP/1.1 "+publicErr+errorHeaders+publicErr)
c.closeWriteAndWait()
return

case isUnsupportedTEError(err):
// Respond as per RFC 7230 Section 3.3.1 which says,
//      A server that receives a request message with a
//      transfer coding it does not understand SHOULD
//      respond with 501 (Unimplemented).
code := StatusNotImplemented

// We purposefully aren't echoing back the transfer-encoding's value,
// so as to mitigate the risk of cross side scripting by an attacker.
fmt.Fprintf(c.rwc, "HTTP/1.1 %d %s%sUnsupported transfer encoding", code, StatusText(code), errorHeaders)
return

case isCommonNetReadError(err):
return // don't reply

default:
if v, ok := err.(statusError); ok {
fmt.Fprintf(c.rwc, "HTTP/1.1 %d %s: %s%s%d %s: %s", v.code, StatusText(v.code), v.text, errorHeaders, v.code, StatusText(v.code), v.text)
return
}
publicErr := "400 Bad Request"
fmt.Fprintf(c.rwc, "HTTP/1.1 "+publicErr+errorHeaders+publicErr)
return
}
}

// Expect 100 Continue support
req := w.req
if req.expectsContinue() {
if req.ProtoAtLeast(1, 1) && req.ContentLength != 0 {
// Wrap the Body reader with one that replies on the connection
req.Body = &expectContinueReader{readCloser: req.Body, resp: w}
w.canWriteContinue.setTrue()
}
} else if req.Header.get("Expect") != "" {
w.sendExpectationFailed()
return
}

c.curReq.Store(w)

if requestBodyRemains(req.Body) {
registerOnHitEOF(req.Body, w.conn.r.startBackgroundRead)
} else {
w.conn.r.startBackgroundRead()
}

// HTTP cannot have multiple simultaneous active requests.[*]
// Until the server replies to this request, it can't read another,
// so we might as well run the handler in this goroutine.
// [*] Not strictly true: HTTP pipelining. We could let them all process
// in parallel even if their responses need to be serialized.
// But we're not going to implement HTTP pipelining because it
// was never deployed in the wild and the answer is HTTP/2.
serverHandler{c.server}.ServeHTTP(w, w.req)
w.cancelCtx()
if c.hijacked() {
return
}
w.finishRequest()
if !w.shouldReuseConnection() {
if w.requestBodyLimitHit || w.closedRequestBodyEarly() {
c.closeWriteAndWait()
}
return
}
c.setState(c.rwc, StateIdle, runHooks)
c.curReq.Store((*response)(nil))

if !w.conn.server.doKeepAlives() {
// We're in shutdown mode. We might've replied
// to the user without "Connection: close" and
// they might think they can send another
// request, but such is life with HTTP/1.1.
return
}

if d := c.server.idleTimeout(); d != 0 {
c.rwc.SetReadDeadline(time.Now().Add(d))
if _, err := c.bufr.Peek(4); err != nil {
return
}
}
c.rwc.SetReadDeadline(time.Time{})
}
}

不难发现，conn.serve函数的代码实现较长，其中，对连接的主要处理由serverHandler{c.server}.ServeHTTP(w, w.req)函数调用实现。

这一步调用实质上时首先实例化了一个Server实例，然后调用实例的ServeHTTP函数对连接的请求与响应进行具体的处理。上文讲到，实现了ServeHTTP函数就实现了Handler接口。Gin就是通过实现接口的方式，利用系统的net/http库执行自身的功能。

gin的Engine.ServeHTTP函数

gin在gin.go中实现了ServeHTTP函数，代码如下：

// ServeHTTP conforms to the http.Handler interface.
func (engine *Engine) ServeHTTP(w http.ResponseWriter, req *http.Request) {
c := engine.pool.Get().(*Context)
c.writermem.reset(w)
c.Request = req
c.reset()

engine.handleHTTPRequest(c)

engine.pool.Put(c)
}

主要步骤为：

建立连接上下文：从缓存池中提取上下文对象，填入当前连接的http.ResponseWriter实例与http.Request实例；
处理连接：以上下文对象的形式将连接交给函数处理，由engine.handleHTTPRequest(c)封装实现了；
回收连接上下文：处理完毕后，将上下文对象回收进缓存池中。

值得注意的是，Gin中对每个连接都需要的上下文对象进行缓存化存取，通过缓存池节省连接高并发时上下文对象频繁生灭造成内存频繁分配与释放的代价。

gin的Engine.handleHTTPRequest函数

handleHTTPRequest函数封装了对请求进行处理的具体过程，位于gin/gin.go中，代码如下：

func (engine *Engine) handleHTTPRequest(c *Context) {
httpMethod := c.Request.Method
rPath := c.Request.URL.Path
unescape := false
if engine.UseRawPath && len(c.Request.URL.RawPath) > 0 {
rPath = c.Request.URL.RawPath
unescape = engine.UnescapePathValues
}

if engine.RemoveExtraSlash {
rPath = cleanPath(rPath)
}

// Find root of the tree for the given HTTP method
t := engine.trees
for i, tl := 0, len(t); i < tl; i++ {
if t[i].method != httpMethod {
continue
}
root := t[i].root
// Find route in tree
value := root.getValue(rPath, c.params, unescape)
if value.params != nil {
c.Params = *value.params
}
if value.handlers != nil {
c.handlers = value.handlers
c.fullPath = value.fullPath
c.Next()
c.writermem.WriteHeaderNow()
return
}
if httpMethod != "CONNECT" && rPath != "/" {
if value.tsr && engine.RedirectTrailingSlash {
redirectTrailingSlash(c)
return
}
if engine.RedirectFixedPath && redirectFixedPath(c, root, engine.RedirectFixedPath) {
return
}
}
break
}

if engine.HandleMethodNotAllowed {
for _, tree := range engine.trees {
if tree.method == httpMethod {
continue
}
if value := tree.root.getValue(rPath, nil, unescape); value.handlers != nil {
c.handlers = engine.allNoMethod
serveError(c, http.StatusMethodNotAllowed, default405Body)
return
}
}
}
c.handlers = engine.allNoRoute
serveError(c, http.StatusNotFound, default404Body)
}

Engine.handleHTTPRequest函数的主要处理位于中间的for循环中，主要为：

遍历查找engine.trees以找出当前请求的HTTPMethod对应的处理树；
从该处理树中，根据当前请求的路径与参数查询出对应的处理函数value；
将查询出的处理函数链（gin.HandlerChain）写入当前连接上下文的c.handlers中；
执行c.Next()，调用handlers链上的下一个函数（中间件/业务处理函数），开始形成LIFO的函数调用栈；
待函数调用栈全部返回后，c.writermem.WriteHeaderNow()根据上下文信息，将HTTP状态码写入响应头。

2.2.2 中间件与handler

请求发来时，被中间件与业务逻辑的handler处理，Gin的中间件与业务逻辑函数实质上都是gin.HandlerFunc函数。

例如，为gin.Engine添加了两款中间件（MiddeWareA与MiddleWareB）并为GET方法的/hello路径注册了一个Hello函数作为路由处理函数，那么执行过程为：

上述handleHTTPRequest函数执行到c.Next()，调用MiddleWareA；
MiddleWareA执行到c.Next()，调用MiddleWareB；
MiddleWareB执行到c.Next()，调用Hello；
Hello函数返回，MiddleWareB继续执行至函数返回；
MiddleWareA函数继续执行至函数返回。

gin的Context.Next函数

中间件中屡屡调用的c.Next()函数时gin提供的中间件流程控制函数之一，位于gin/context.go中，代码如下：

/************************************/
/*********** FLOW CONTROL ***********/
/************************************/

// Next should be used only inside middleware.
// It executes the pending handlers in the chain inside the calling handler.
// See example in GitHub.
func (c *Context) Next() {
c.index++
for c.index < int8(len(c.handlers)) {
c.handlers[c.index](c)
c.index++
}
}

不难理解，Next函数起到的作用是，在当前中间件函数中，调用下一个HandlerFunc。依序调用HandlerChain中的HandlerFunc的过程中，形成了一个函数调用栈，调用时函数依序入栈，至最后一个函数调用返回，此后按LIFO的顺序出栈，自然就形成了上述中间件的LIFO的执行顺序。

2.2.3 请求处理与响应

在本例中，我写了一个简易的创建Person的API，其涉及到模型定义与业务逻辑。

模型定义

模型定义位于/model/person.go中，代码如下：

package model

import "time"

type Person struct {
Name  string `json:"name"`
Phone string `json:"phone"`
Age   uint64 `json:"age"`
}

type CreatePersonRequest struct {
Person Person `json:"person"`
}

type CreatePersonResponse struct {
Person   Person        `json:"person"`
Elapse   time.Duration `json:"elapse"` // nano seconds
BaseResp BaseResp      `json:"baseresp"`
}

其中，BaseResp位于/model/base.go中，代码如下：

package model

type BaseResp struct {
Code    int64  `json:"code"`
Message string `json:"message"`
}

业务逻辑

业务逻辑函数位于handler/person/create_person.go中，代码如下：

package person

import (
"fmt"
"time"

"github.com/LearnGin/model"
"github.com/gin-gonic/gin"
"github.com/gin-gonic/gin/binding"
)

func CreatePersonHandler() gin.HandlerFunc {
return func(c *gin.Context) {
// parse request
tic := time.Now()
req := new(model.CreatePersonRequest)
err := c.ShouldBindWith(req, binding.JSON)
if err != nil {
fmt.Errorf("Can not bind with model.Person, err: %+v\n", err)
resp := new(model.CreatePersonResponse)
resp.Elapse = time.Since(tic)
resp.BaseResp = model.BaseResp{
Code:    1,
Message: fmt.Sprintf("create person failed in binding json, err: %s", err.Error()),
}
c.JSON(200, resp)
return
}

// process request
fmt.Printf("Creating Person: %+v\n", req.Person)

// jsonify response
resp := new(model.CreatePersonResponse)
resp.Person = req.Person
resp.Elapse = time.Since(tic)
resp.BaseResp = model.BaseResp{
Code:    0,
Message: "success",
}
c.JSON(200, resp)
}
}

处理上主要分三步：

解析请求：c.ShouldBindWith(req, binding.JSON)负责解析请求中发来的JSON数据，并将解析结果绑定到指定的结构体对象上；
业务处理：此处只做print显示；
发送响应：实例化响应结构体，并将其序列化为JSON作为响应。

值得注意的是：此处的序列化与反序列化会参照结构体的类型tag（如有）。

3 总结

结合对Gin框架主干代码以及其调用的部分Go源码的阅读，可以体会到：

Gin框架实质上实现的网络通信层以上的框架搭建，而网络通信功能完全采用Go语言的net/http库实现；
Gin通过实现Go语言提供的接口快捷地接入Go的内置库功能，使得上层应用与底层实现之间互不依赖，充分体现了SOLID中的依赖倒置原则；
Gin在性能上针对HTTPWeb框架常见的高并发问题进行了优化，例如：通过上下文对象的缓存池节省连接高并发时内存频繁申请与释放的代价；
Gin在设计上将中间件与业务逻辑都抽象为gin.HandleFunc函数，中间件与业务逻辑的执行过程实际上就是函数序列依序调用形成的函数调用栈的执行过程。

TCMalloc - Go的内存分配原理

2021-08-10T15:31:15.000Z

Golang的内存分配机制主要基于TCMalloc机制，本文根据TCMalloc: Thread-Caching Malloc一文了解原理并总结笔记。

TCMalloc - Go的内存分配原理

本文参考自：

TCMalloc: Thread-Caching Malloc
Sanjay Ghemawat, Paul Menage opensource@google.com

1 简介

TCMalloc（Thread-CachingMalloc）是Google发布的一款线程缓存型内存分配机制。TCMalloc为每一个线程都缓存一些可分配内存，因此，在多线程场景下，TCMalloc能够尽可能规避多个线程同时分配/释放内存时的锁争用问题，这使得TCMalloc相较于其它内存分配机制，内存分配和回收速度更快。另外，TCMalloc还有内存分配利用率高的优势。

2 原理

TCMalloc通过Thread Cache和Central Heap组成的双层结构分配内存。

线程分配内存时，TCMalloc从该线程的线程缓存（ThreadCache）中取出恰当尺寸的内存块。而线程释放回线程缓存的内存，也会由垃圾回收机制收纳回中央堆区（CentralHeap）。具体地，TCMalloc的内存分配分两种情况：

小对象分配（小于等于32KB）
大对象分配（大于32KB）

2.1 小对象分配

当线程请求分配不超过32KB的小对象时，线程缓存为其分配恰当尺寸的内存块。

线程缓存（ThreadCache）维护着一个数组到单向链表的数据结构，数组中的每一个节点都从小到大依次代表一个可分配尺寸（共约170种尺寸），每个尺寸以单链表的形式维护该尺寸的可分配内存。

当一个线程请求分配内存时：

首先根据内存需求，找到合适的尺寸（例如：申请961~1024字节，均分配1024字节的内存对象）；
在该尺寸的链表上，检查是否有可分配内存块：
1. 如果有该尺寸的内存对象，那么取出链表中第一个可分配内存块供线程使用即可；
2. 如果没有该尺寸的内存对象，那就得从中央堆区去拿一些内存来用：
  1. 如果中央堆区有该尺寸的内存，那么就取过来用就可以了。将取来的一些内存对象补充到该尺寸的链表里，并从补充后的链表中拿一个内存对象出来供线程使用；
  2. 如果连中央堆区也没有该尺寸的内存了，那就需要为其补充更多的内存：
    1. 通过中央页分配器（central pageallocator）来分配内存页（page）；
    2. 将分配到的内存页分解为该尺寸的一系列对象；
    3. 把分解出的这些对象补充到中央堆区该尺寸的链表上；
    4. 既然中央堆区补充好该尺寸的内存了，那就照常拿一些补充到请求线程的线程缓存中，供其分配使用。

2.2 大对象分配

超过32K的大对象以4K的内存页为单位进行分配。直接由中央堆区负责维护空页，通过链表分别归纳维护长度为1~255页的空内存块，长度超过255页的内存块则由rest链表统一管理。

当请求内存时，根据需求的页面数量找到对应页面数的链表；

如果链表内有内存，则分配；
如果链表内没有内存，则找下一个更大尺寸的链表，
1. 如果找到了，则分配内存，并将剩余页面插入对应尺寸的链表中；
2. 如果整个中央堆区都找不到合适尺寸的内存，则向操作系统申请内存以补充到中央堆区中。

2.3 Spans

TCMalloc通过span对象来组织内存页。一个span代表一些连续的内存页。

通过一个数据结构来维护从页号到span地址的映射：

在32位环境下，32位的地址能够寻址2^{32B的内存空间，如果按每个内存页4KB的尺寸进行分页，总共2}20，即1M个页号。每个span地址为32位，即4B，那么通过4MB的数组就能够实现从页号到span地址的寻址。

在64位环境下，考虑到地址空间很大，因此通过一个3层的基数树（radixtree）来建立页号到span地址的映射。

2.4 释放

当内存对象释放时，首先根据内存页号查出对应的span对象。通过span对象进行判断：

如果是小对象，则将其插入其线程缓存中对应尺寸的空闲链表（freelist）中。
1. 如果线程缓存超出了预设尺寸（默认2MB），则需要运行垃圾回收，将线程缓存中不用的对象还给中央堆的空闲链表。
如果是大对象，则将span管理的内存页与相邻内存页合并后，还给中央堆区的空闲链表。

2.5 线程缓存的垃圾回收

当线程缓存中的空余内存超过阈值（默认2MB）时，会触发垃圾回收，把线程缓存中的内存还给中央堆区的空闲链表。

当线程数量增加时，垃圾回收阈值会减小，以免线程数量很多时浪费内存空间。

垃圾回收机制会记录线程缓存中每一个空闲链表的低水位L。L指的是自上一次垃圾回收以来该链表的最短长度。TCMalloc每次将空闲链表中L/2个内存对象回收到中央堆区。每次回收L/2，这样的回收速度能够很快地将长期不用的空闲链表回收到中央堆区的空闲链表中，以便其他有需要的线程快速获取。

Go Web开发笔记

2021-05-06T04:01:29.000Z

整理Go Web开发的相关知识点。

Go Web开发笔记

1 GoWiki：Go Web应用案例

GoWiki是一个极简的GoWeb应用，使用Go语言内置的html/template和net/http等库实现，实现基本的百科网站功能，包含词条创建、编辑、保存和浏览功能。

本节内容总结自官方教程Writing WebApplications。

1.1 项目结构

gowiki

wiki.go
edit.html
view.html

1.2 代码实现

1.3.1 wiki.go

// Writing Web Applications
// Official Example from https://golang.org/doc/articles/wiki/
package main

import (
"errors"
"fmt"
"html/template"
"io/ioutil"
"log"
"net/http"
"regexp"
)

type Page struct {
Title string
Body  []byte
}

var templates = template.Must(template.ParseFiles("edit.html", "view.html"))
var validPath = regexp.MustCompile("^/(edit|save|view)/([a-zA-Z0-9]+)$")

func (p *Page) save() error {
filename := p.Title + ".txt"
return ioutil.WriteFile(filename, p.Body, 0600)
}

func loadPage(title string) (*Page, error) {
filename := title + ".txt"
body, err := ioutil.ReadFile(filename)
if err != nil {
return nil, err
}
return &Page{Title: title, Body: body}, nil
}

func getTitle(w http.ResponseWriter, r *http.Request) (string, error) {
m := validPath.FindStringSubmatch(r.URL.Path)
if m == nil {
http.NotFound(w, r)
return "", errors.New("invalid Page Title")
}
return m[2], nil // The title is the second subexpression.
}

func renderTemplate(w http.ResponseWriter, tmpl string, p *Page) {
err := templates.ExecuteTemplate(w, tmpl+".html", p)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
}
}

func viewHandler(w http.ResponseWriter, r *http.Request, title string) {
p, err := loadPage(title)
if err != nil {
http.Redirect(w, r, "/edit/"+title, http.StatusFound)
return
}
renderTemplate(w, "view", p)
}

func editHandler(w http.ResponseWriter, r *http.Request, title string) {
p, err := loadPage(title)
if err != nil {
p = &Page{Title: title}
}
renderTemplate(w, "edit", p)
}

func saveHandler(w http.ResponseWriter, r *http.Request, title string) {
body := r.FormValue("body")
p := &Page{Title: title, Body: []byte(body)}
err := p.save()
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
http.Redirect(w, r, "/view/"+title, http.StatusFound)
}

func makeHandler(fn func(http.ResponseWriter, *http.Request, string)) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
m := validPath.FindStringSubmatch(r.URL.Path)
if m == nil {
http.NotFound(w, r)
return
}
fn(w, r, m[2])
}
}

func main() {
http.HandleFunc("/view/", makeHandler(viewHandler))
http.HandleFunc("/edit/", makeHandler(editHandler))
http.HandleFunc("/save/", makeHandler(saveHandler))
host := "127.0.0.1"
port := 8080
addr := fmt.Sprintf("%s:%v", host, port)
fmt.Printf("goWiki start listening at http://%s\n", addr)
log.Fatal(http.ListenAndServe(addr, nil))
}

1.3.2 edit.html

<h1>Editing {{.Title}}h1>

<form action="/save/{{.Title}}" method="POST">
<div><textarea name="body" rows="20" cols="80">{{printf "%s" .Body}}textarea>div>
<div><input type="submit" value="Save">div>
form>

1.3.3 view.html

<h1>{{.Title}}h1>

<p>[<a href="/edit/{{.Title}}">edita>]p>

<div>{{printf "%s" .Body}}div>

1.3 运行说明

单文件go程序，通过以下命令即可运行：

1	go run wiki.go

或编译后再运行：

1 2	go build wiki.go ./wiki.go

2 Gin：Go Web框架

从上节可以看到，Go语言的net/http和html/template已经足够实现基本的Web应用，但Go自带的路由http.ServerMux机制简单，只能实现从请求路径（string）到处理函数（handler）的映射，无法根据HTTP的方法（Method），请求头（header）进行路由。GoWeb框架实现了比内置库更丰富的功能，例如Gin。

Gin WebFramework
Gin is a web framework written in Go (Golang). It features amartini-like API with performance that is up to 40 times faster thanksto httprouter.If you need performance and good productivity, you will love Gin.

此外，还有其他Go Web框架，如：gorilla/mux、echo。

3 数据库存储

3.1 SQL

Go语言没有内置数据库驱动。

Go语言定义了database/sql接口，分离出接口实现与接口调用，使得调用方改换数据库时无需修改代码。

参阅：longjoy/micro-go-book/ch5-web/mysql/mysql.go

func init() {
db, err = sql.Open("mysql",
"root:a123456@tcp(47.96.140.41:3366)/user?charset=utf8")
checkErr(err)
}

func queryByName(name string) User {
user := User{}
stmt, err := db.Prepare("select * from user where name=?")
checkErr(err)

rows, _ := stmt.Query(name)

fmt.Println("\nafter deleting records: ")
for rows.Next() {
var id int
var name string
var habits string
var createdTime string
err = rows.Scan(&id, &name, &habits, &createdTime)
checkErr(err)
fmt.Printf("[%d, %s, %s, %s]\n", id, name, habits, createdTime)
user = User{id, name, habits, createdTime}
break
}
return user
}

func store(user User) {
//插入数据
stmt, err := db.Prepare("INSERT INTO user SET name=?,habits=?,created_time=?")
t := time.Now().UTC().Format("2006-01-02")
res, err := stmt.Exec(user.Name, user.Habits, t)
checkErr(err)

id, err := res.LastInsertId()
checkErr(err)

fmt.Printf("last insert id is: %d\n", id)
}

3.2 NoSQL

Go语言的结构体和NoSQL的JSON可以很好地直接对应起来，因此，Go语言中一般可以直接操作NoSQL，不依赖ORM。

参阅：longjoy/micro-go-book/ch5-web/mongo/mongo.go

func connect(cName string) (*mgo.Session, *mgo.Collection) {
session, err := mgo.Dial("mongodb://47.96.140.41:27017/") //Mongodb's connection
checkErr(err)
session.SetMode(mgo.Monotonic, true)
//return a instantiated collect
return session, session.DB("test").C(cName)
}

func queryByName(name string) []User {
var user []User
s, c := connect("user")
defer s.Close()
err := c.Find(bson.M{"name": name}).All(&user)
checkErr(err)
return user
}

func store(user User) error {
s, c := connect("user")
defer s.Close()
user.Id = bson.NewObjectId().Hex()
return c.Insert(&user)
}

3.3 beego/orm：Go ORM框架

Beego
Beego is used for rapid development of enterprise application in Go,including RESTful APIs, web apps and backend services.
It is inspired by Tornado, Sinatra and Flask. beego has someGo-specific features such as interfaces and struct embedding.

Beego是一个简单易用的企业级Go应用开发框架，其中包含了ORM框架。

Beego的ORM的具体使用方法可以参阅其文档：

ORM 使用方法

matplotlib绘制图像注意力

2021-04-12T02:29:56.000Z

通过matplotlib可以在图像表层对图像注意力机制进行可视化绘制，即将图像注意力叠加在图像表层。

matplotlib绘制图像注意力

1 效果展示

2 实现原理

import matplotlib.pyplot as plt
import cv2
import numpy as np

# read target image
demo_img_path = r"res\img\36979.jpg"
demo_img = plt.imread(demo_img_path)
demo_img_h, demo_img_w, demo_img_c = demo_img.shape

demo_img_att = np.array([
    [0.1, 0.4, 0.4, 0, 0],
    [0, 0.4, 0.4, 0.4, 0],
    [0.4, 0, 0, 0.3, 0.4],
    [0.2, 0.1, 0, 0, 0.4],
    [0.3, 0.4, 0.1, 0, 0],
])
# resize the image attention to the target image size with interpolation
demo_img_att = cv2.resize(demo_img_att,
                          dsize=(demo_img_w, demo_img_h),
                          interpolation=cv2.INTER_CUBIC)

# plot with matplotlib
plt.figure(figsize=(9, 5))

# plot target image
plt.subplot(1, 2, 1)
plt.imshow(demo_img)
plt.axis("off")
plt.title("image")

# plot image with attention masked on it
plt.subplot(1, 2, 2)
plt.imshow(demo_img)
plt.imshow(demo_img_att, alpha=0.8, cmap="gray")
plt.axis("off")
plt.title("image with attention")

# shrink padding etc. to a tight layout
plt.tight_layout()

# save figure and show on display
plt.savefig("demo_img_att.png")     # to disk
plt.show()                          # on display

主要的绘制要点在于：

绘制前，需要对注意力层进行插值，调整到与原图相同的尺寸；
绘制时，先绘制原图，再绘制插值后的注意力层，且绘制时设置好colormap(cmap)。

投资学笔记

2021-03-08T15:19:12.000Z

通过中央财经大学的投资学课程，系统性地学习基本的投资学原理。

投资学笔记

对投资做的好的大师，首先必须是控制风险的大师。

1投资的内涵及其与宏观经济的关系

1.1 投资与投资主体

投资的定义

为了（可能不确定的）将来的消费（价值）而牺牲现在一定的消费（价值）。

投资主体

家庭

家庭收入

投资
- 直接投资：资金的使用者和所有者一致；
- 间接投资：资金的使用者和所有者不一致。
消费
- 购买商品或服务

企业

企业收入用于

投资
- 研发投资、员工培训
- 建造厂房、购买设备
- 存货投资
- 金融投资
消费
- 工资福利
- 原材料

企业与家庭

企业的工资福利流入家庭，家庭的投资和消费流入企业。

政府

资金来源

税收
政府融资

政府收入用于

政府投资
- 直接投资
- 间接投资
政府支出
- 医疗等社会保障
- 公务员支出

投资活动

直接投资

居民或企业建造房屋等不动产
企业研发费用、存货投资
政府投资基础建设

间接投资

居民投资购买商品房或金融资产
企业购买金融资产
政府购买国内外金融资产

实物资产投资

居民建造房屋、投资购买商品房
企业建设厂房、购买设备、存货投资
政府投资基础设施

金融资产投资

居民投资股票、债券、基金或存款
企业购买股票、债券、衍生品等金融资产
政府购买国内外金融资产

1.2投资与宏观经济运行的内部逻辑

资产负债表

企业部门的金融负债和居民部门的金融资产接近。

宏观经济上，企业部门的负债主要来源于居民部门的金融资产。

金融泡沫

当股票等金融资产价格呈现上涨趋势时，

家庭投入更多收入到金融资产
企业融资或变卖实物资产到金融资产
银行的货币创造进一步推高金融资产价格

形成金融泡沫。

虚拟经济

虚拟经济与实体经济

虚拟经济与实体经济是此消彼长，又相辅相成的。

托宾Q系数

Q = 公司的市场价值/公司的重置成本

Q>1，投资者理性选择出售股票，然后重新建立同样的公司获得更多收益；
Q<1，投资者倾向并购公司扩张，而非重建。

实际情况Q一般大于1，因为存在专利、壁垒等……

1.3 投资与短期经济增长

投资-储蓄恒等式

C+S=C+I

居民消费C+居民储蓄S=企业所生产产品C+企业投资I

加入政府和国际经济体时：

C+S+T+Kr=C+I

1.4 政府投资与短期经济增长案例

维持经济稳定，应对金融危机的4万亿投资计划

29.50%，11800亿，来自中央预算内投资、重要政府性基金、中央财政其他公共投资以及中央财政灾后恢复重建基金；
70.50%，28200亿，来自地方财政预算、中央财政代发地方政府债券、政策性贷款、企业债券和中期票据、银行贷款以及民间投资。

影响：

央行资产负债表扩表，货币规模增幅超两倍；
社会融资规模高速增长，增幅超过200%；
地方政府债务规模扩大，违约可能性大大提高。

负面效应：

各部门债务高企、违约概率大大提高；
产能过剩（投资挤压消费）、结构失衡、亟待供给侧改革；
民营企业倒闭潮、工人失业；
房价飙升（M2增速很快），催生一系列民生问题。

1.5投资与长期经济增长和经济波动

投资与长期经济增长

索洛模型

只有生产和消费。

储蓄总是等于投资。

生产三大要素：资本K、劳动L、知识A。

资本投入存在边际效应。

索洛模型表明：在具有相同生产函数、储蓄和折旧率情况下，经济体最终所达到的均衡状态与初始状态无关。

提高储蓄率，可以增加资本投入，继而提高人均产出。

但是，储蓄率过分提高，会牺牲一代人的消费。

内生增长模型

将技术进步纳入内生变量。

纳入技术进步后，资本的边际报酬不再递减。

案例：供给侧改革

投资与经济波动

萨穆尔森-凯恩斯经济周期理论

由于投资存在乘数效应且产出存在加速效应，因此到达一定峰值后，经济就会衰退，到达低谷后，一些投资需求被刺激，又进入上升通道。

认为投资和消费本身的特性产生了经济周期。

实际经济周期理论

认为经济周期是外部因素引起的。

熊彼特的创新周期论。

卡莱斯基的政治周期论。

1.6 投资规模与投资效率

投资规模

年度投资规模（短期、流量）

在建投资规模（长期、存量）

投资结构

投资主体结构：

企业
- 利润最大化
- 自主投资、市场化运作
- 促进经济发展
政府
- 社会福利最大化
- 国企投资、政策影响
- 调整经济结构

投资效率的衡量

资本产出比

Capital-Output Ratio, COR

投资产出比

Incremental Capital-Output Ratio, ICOR

资本边际收益均一化原则

当市场有效且达到均衡时，各个部门的资本边际收益率呈现均一化的特征。

（否则资本会从低收益部门流向高收益部门）

衡量方式

调整推算法：对统计数据要求很高；
函数估计法：需要假定生产函数。

2 行业投资分析

2.1 行业的涵义与分类

2.1.1 行业的涵义

一个企业群，群内各成员所生产的商品对消费者时可互相替代的。

2.1.2 行业的分类

道琼斯分类法

联合国国际标准行业分类法

我国国民经济行业分类法

我国上市公司行业分类法

证监会标准
申银万国分类标准
Wind分类标准

2.2 行业的生命周期

2.2.1 行业生命周期的阶段

四个阶段：

创业阶段：开始增长
1. 大众尚未认知该产品/服务，市场需求较小
2. 投资规模小，风险很大
成长阶段：快速增长
1. 大众认识到该产品/服务，市场需求迅速扩大
2. 销售收入迅速扩大，开始盈利
3. 产品需要完善，投资需求强烈，风险较大（新生事物政策风险大）
成熟阶段：缓慢增长
1. 市场趋于饱和，市场竞争（相对）垄断，少数企业分享高额利润
2. 产品成熟稳定，投资需求不大，风险较低，投资可获得稳定回报
衰退阶段：缓慢下滑
1. 本产品更新跟不上大量出现的替代品
2. 市场需求减少，销售利润下降，风险增大，不宜大量投资

2.2.2 行业发展阶段分析

特征	创业阶段	成长阶段	成熟阶段	衰退阶段
行业规模	较小	扩大	饱和	缩小
产出增长	较快	很快	较慢	很慢，甚至为负
利润水平	低	高	低	亏损
技术创新	较快	逐渐稳定	稳定	淘汰或被替代
竞争者数量	很多	增多	下降	降至不足
开工率	提高	满负荷	下降	降至不足
资本进退	进大于出	进大于出	进出平衡	进小于出

2.2.3 影响行业兴衰的因素

技术进步
社会习惯
产业政策
经济全球化

其他高壁垒行业、政府介入行业等，难以用行业生命周期解释。

2.3 行业与经济周期

2.3.1不同行业对经济周期的敏感度

	增长型	周期型	防御型
与经济周期的关系	受经济周期影响不大	与经济周期直接相关	产品需求相对稳定，受经济周期影响较小
增长的核心来源	技术进步等不受经济周期影响的因素	居民收入等受经济周期直接影响的因素	居民刚性需求
案例	计算机相关行业	汽车等行业	医药、生活必需品等行业
经济繁荣时	增长	增长	相对稳定
经济衰退时	增长	衰落	相对稳定

2.3.2 经济周期敏感度的决定因素

销售额对经济周期的敏感度
经营杠杆
财务杠杆

2.3.3行业轮动——美林的投资时钟理论

行业轮动是根据商业周期状态预测业绩卓越的行业或部门，并将投资组合转向这些行业或部门。

宏观经济周期分为四个阶段：

衰退
1. 低增长、低通胀
2. 债券投资
复苏
1. 高增长、低通胀
2. 股票投资
过热
1. 高增长、高通胀
2. 大宗商品
滞胀
1. 低增长、高通胀
2. 现金

投资策略：

周期性
1. 经济增长加快时，投资股票和大宗商品，选择周期型行业（如汽车、钢铁）；
2. 经济增长减慢时，投资债券或现金，选择防御型行业（如医药、公共事业）；
久期
1. 通胀率下降时，投资债券或股票，选择久期较长的债券或成长型股票；
2. 通胀率上升时，折现率上升，投资大宗商品和现金，或估值波动小且久期短的价值型股票。

适用情况：

在美国有效，1973.4-2004.7超过30年，美国经济周期可较明确分为四个阶段，每阶段平均20个月左右，一个经济周期约6年。
在中国不完全适用，因为：
1. 央行货币政策逆周期调整对金融市场影响巨大（2011-2015经济稳步下降，2013年收紧货币使流动性趋紧、股市大跌，2015年降准降息等改革政策推动短期大牛市），
2. 经济转型的结构性调整也有很大影响（复苏阶段供给侧改革淘汰落后产能）。

2.4 行业的结构及其分析

2.4.1 行业的结构分析

特征	完全竞争	垄断竞争	寡头垄断	完全垄断
企业数目	众多	很多	较少	单个企业
生产要素流动性	完全自由流动	自由流动	较难流动	不流动
产品差异性	同质无差别	存在差别	同质或存在差别	无
企业定价能力	企业仅接受价格，无法制定价格	企业对价格有控制能力	企业对价格具有垄断能力	企业垄断定价，但受到法律管制
典型行业	初级产品（例如：农产品）	家电、洗发水等消费品	资本、技术密集型行业，少数储量集中的矿产品	公共事业，资本、技术高度密集型行业，稀有金属矿藏开采行业

2.4.2 波特的五力模型

供应商：供应商议价能力
购买者：买方议价能力
竞争对手：现有公司之间的竞争
潜在进入者：新进入者的威胁
替代品：替代品的威胁

2.4.3 其他行业分析工具

PESTLE模型

政治P
经济因素E
社会文化因素S
科技T
法律L
环保E

行业集中度分析模型

集中度体现在行业前K名的累计市场份额。

集中度曲线：

上升，则行业竞争激烈，优势企业纷纷渠道扩张、降价等方式扩大市场。
- 迅速上升蕴含发展机会，加大市场投入、加快渠道建设往往成效。
稳定，则市场竞争结构稳定，领导企业优势地位业已建立。
- 稳定中的行业机会不大，企业扩张会受到领导企业的集体抵制，需细分化、差异化发展。

散点市场->块状同质化市场->团状异质化市场

行业关键成功要素模型

列表格，对要素打分。

2.5 行业定量分析

2.5.1 市盈率

行业市盈率=行业（价格）指数 / 行业利润率

行业指数代表投资者对行业的估值；
行业利润率代表行业的盈利能力。

投资策略：同等条件下，尽量选择行业指数和行业利润率较高，而市盈率较低的行业。

注意：行业之间的市盈率不具备可比性。

2.5.2 回归分析法估计行业收益率

利用行业的历史数据回归估计行业收益率。

注意：有效市场中，未来股价不受过去股价影响，用现在收益率难以预测未来收益率。因此一般很少用定量分析预测行业未来，常用定性分析和经验。

3 项目投资评估方法

略

4 融资与创新

金融工具一般按期限来分类：

一年期以上，称为资本市场，高风险高收益；
一年期以下，称为货币市场，大多流动性好、信用安全。

4.1 债券市场

4.1.1 债券市场概况

债券市场可追溯到1792年纽约股票交易所。

美国债券市场

美国政府债券：
1. 短期国库券（Treasury Bills）：90天~1年不等；
2. 中期国库票据（Treasury Notes）：2~10年不等；
3. 长期国债（Treasury Bonds）：10~30年不等。
4. 以政府信用担保，还可免交州及地方税。
市政债券：州和地方政府发行，类似我国的地方债
1. 一般责任债券：由发行者的信用（财政能力、收税能力）支撑；
2. 收入债券：由地方基建、公共服务的收益支撑；
3. 安全性：一般责任债券>收入债券
4. 也可免税
政府机构债券
1. 通常是联邦政府级机构发行，用于资助和公共政策相关项目，如：农业、小企业、首次购房者贷款。
公司债券
1. 即使大公司也有风险，会受到经济、管理及竞争等的影响
2. 发行需经过评级认定，有：标准普尔、穆迪、惠誉

4.1.2 债券分类及简介

国债

形式：

凭证式国债：记名、可挂失、仅银行网点、财政部门国债服务部发行，不上市流通；
记账式国债：1994年开始发行，电脑系统账户，记名、可挂失、效率高、简便；
不记名（实物）国债：不记名、不挂失、可流通。

风险：安全性非常好，近乎于现金等价物。

期限：

短期国债
中长期国债

目的：

央行利用短期国债做公开市场运作
作为市场无风险利率基准
第二准备金
筹集财政资金

发行：

固定收益出售法
公募拍卖：竞价投标；
1. 美式拍卖：“加价拍卖”，以加权平均中标价格为当期国债发行价格
  1. 竞争性报价中标者按各自高低不同的投标价购买国债，通常是大型机构投资者；
  2. 非竞争性报价中标者按加权平均中标价格购买，通常是中小型投资者。
2. 荷兰式拍卖：“减价拍卖”，按竞价递减直到第一个竞价人应价（或到达底价）时击槌成交，以最低成功出价金额作为成交价，作为当期国债发行价，中标者统一按发行价购买国债；
  1. 价高者优先，相同价格先出价先得；
3. 混合式拍卖：两种混合，以加权平均中标价格作为债券发行价格；
  1. 高于或等于发行价的中标，按发行价购买国债（荷兰式）；
  2. 低于发行价一定范围的中标，按各中标价格购买（美国式）；
  3. 低于发行价超出一定范围的中标，全部落标。
连续经销：柜台出卖；
承受发行：直接推销；

地方政府债券

地方政府募资发行。

目的：

帮助国家实施财政或货币政策；
公共设施建设；
弥补财政赤字。

规模：远远大于国债发行数量。

还本付息受财政、地方发展水平影响，信用比国债低些。

央行票据

中央银行票据，调节商业银行超额准备金而发行的短期债务凭证，实质时中央银行债券。

目的：不是为了筹集资金，而是央行调节基础货币的货币政策工具，为了减少商业银行可贷资金量。

大额可转让存单CDs

因资金流向债券，商业银行需要吸引储蓄稳定存款。

存单注明存款期限和利率，可到期取本息，也可到期前转让，可在二级市场流通。

同业存单

银行和基金公司间流通，同业拆借。

债券回购

回购协议：先卖出债券，再回购。相当于以债券作为抵押品。

金融债

金融机构发行的债券。

公司债

企业发行的债券，受证监会管理。

风险和收益都高于国债。

企业债

在西方，企业债即公司债。

在我国，企业债券是中央政府部门所属机构、国有独资企业或国有控股企业发行的债券。

企业债的发行与政府部门的审批项目直接相关，发行由发改委审批。

商业票据

金融公司或高信用企业开出的无担保短期票据。

期限在1个月~1年，通常滚动发行，为旧票据还本付息。

短期融资券

企业在银行间市场发行，由金融机构购买，不向社会发行，一年期内还本付息的有价证券，是短期贷款的替代品。

无担保、短期、需评级。

中期票据

企业在银行间市场发行，是中期贷款的替代品。

可转债

可选是否将债券一定比例转为股权，赋予了债券一定程度的期权能力。

国际债券

可以在海外发行的债券。

私募债

中小型企业的募资需求。

4.2 资产证券化

4.2.1 基本理论

资产证券化（AssetSecuritization）是指以特定资产组合或特定现金流为支持，发行可交易债券的一种融资形式。

起源于1970年美国发行的以抵押贷款组合为基础资产（如：住房抵押贷款）的抵押支持债券（MBS,Mortgage-BackedSecurity），此后从抵押贷款发展到其他资产上（如：汽车贷款、消费贷款等），出现资产支持债券（ABS,‎Asset-Backed Security）。

本质特征：

资产证券化以可预见的现金流为支持而发行证券在资本市场融资的一个过程。
可预见的现金流的资产可以是实物（如：高速公路的收费），也可以是非实物（如：住房抵押贷款、汽车消费带宽、信用卡等的偿还现金流）。
本质与精髓：表面上看起来是以资产为支持，实际上是以资产所产生的现金流为支持。

影响：

模糊了直接融资与间接融资之间的清晰界限，也显示出直接融资的发展前景。
资产证券化利用资本市场对资产的收益与风险进行分离与重组。
资产证券化是债券市场深化的助推器。

4.2.2 基本结构

原始债务人（Obligors）

承担债务，需还本付息。例如，抵押贷款中的借款方。

原始债权人（Originators）

享有债权。例如：抵押贷款中的放款银行。

作为资产证券化的发起人，原始债权人把需要证券化的资产出售给特别目的机构，实现资产风险与收益的充足。

特别目的机构（SpecialPurpose Vehicle, SPV）

从发起人处购买可证券化资产，并发行以此为支持的证券的特殊实体。

一般是不会破产的高信用等级实体。

因为原始债权人将资产真实销售（truesale）给SPV，所以证券化资产的风险与原始债权人的风险可以隔离开来，实现破产隔离（bankruptremote），即使原始债权人破产，也不会影响到投资人对证券化资产的权益，提高了证券化资产的资信评级，降低了融资成本。

投资者（Investors）

购买证券的机构或个人。

因为这类证券通常高收益低风险，因此一般是机构投资者购买，如保险公司、投资基金和银行机构。

专门服务人（Servicer）

一般由发起人兼任。

负责按期收取证券化资产所产生的现金流，并转移给SPV或SPV指定的信托实体。

信托机构（Trustee）

由SPV指定的负责对专门服务人收取的现金流进行管理，并向证券投资者按时支付的机构。

信用评级机构（Rating Agency）

通过对资产证券化各个环节进行评估而给出信用等级的机构。

对证券进行信用增级，降低发行成本。

担保机构（Guarantors）

为SPV发行证券提供担保的机构，为证券进行信用增级。可以是政府担保机构或私人担保公司。

证券承销商（Underwriters）

为SPV所发行证券进行承销的实体，确保证券销售成功。一般是投资银行，或组建的承销团。

4.2.3 基本过程

组建SPV
SPV筛选可证券化的资产组成资产池（asset pool）
SPV与资产相结合阶段（原始权益人真实销售资产给SPV）
SPV发行资产支持证券阶段
SPV清偿债券阶段

4.2.4 中美资产证券化产品

美国

住房抵押贷款证券（MBS）；
资产支持证券（ABS）：汽车贷款证券、信用贷款证券、学生贷款等；
以MBS+ABS现金流为抵押品再证券化的抵押债券凭证（CDO）；

中国

信贷资产证券化：央行和银监会主管，以信贷资产为基础资产。
券商专项资产证券化：证监会主管，以企业应收款、信贷资产、信托收益权、基础设施收益权等财产权利、商业票据、债券等衍生品、股票及衍生品、商业物业等不动产为基础资产。
资产支持票据：交易商协会主管，以公用事业未来收益权、政府回购应收款、企业其他应收款为基础资产。

4.3 股权市场

4.3.1 优先股与普通股

优先股

优先股（preferredstocks）具有权益和债务的双重特征，是再剩余索取权方面较普通股优先的股票。

股息：通常归结为固定收益工具，它与债券一样，都承诺支付定量的股息（事先固定）。

优先性：再分得公司利润时和破产清偿时顺序优于普通股，但都低于债权。

股东权利：优先股在剩余控制方面劣于普通股，不能参与公司的经营管理，没有选举董事会和监事会的权利。

普通股

普通股（commonstocks）是在优先股要求权得到满足之后才参与公司利润和资产分配的股票合同，股息收益上不封顶、下不保底，每一阶段的红利数额也不确定。

股东权利：有出席股东大会的会议权、表决权和选举权、被选举权等，通过投票（通常一股一票和简单多数）来行使剩余控制权。

案例：阿里巴巴的双层结构的普通股，分为：

A股：无投票权->一股一票；
B股：一股一票->数倍于A股的投票权（10~150倍）；

4.3.2 股票种类

A股

人民币普通股

B股

人民币特种股票。以人民币标明面值，以港币或美元交易。

H股

国企股，注册地在内地，上市地在香港的股票。

N股

注册地在中国大陆，上市地在纽约证券交易所。

L股

注册地在中国大陆，上市地在伦敦。

蓝筹股

稳定盈利的大公司发行，定期分派股利，投资价值较高的股票。

起源于赌场中蓝色筹码最值钱。

红筹股

在境外注册、香港上市的中国大陆概念的股票。

ST股

SpecialTreatment，特别处理股，连续两个会计年度净利润为负，每股净资产低于股票面值（1元）。

4.3.3 股票指数

功能

投资指南；
衍生工具的标的：股指期货、股指期权的标的；
宏观经济景气度的指示器。

种类

综合指数：全样本指数（上证综合指数、纽交所综合指数）；
成分指数：部分样本指数（标普500、伦敦金融时报100、沪深300），一般市值大、交易量大、业绩良好、业务稳定；
分类指数：具有相同特征（相同行业）的股票指数（上证行业指数，分为：材料、公用、能源、金融、电信、工业、可选、信息、消费和医药行业）。

编制方法

简单算术平均法
算术平均修正法（道氏修正法）：道琼斯公司1928年创始，修正股票拆细、增资等因素造成平均数变化，保持连续性和可比性。
市值加权平均法：目前主流，权重是公司的规模而非股票价格，可比性强。

中国大陆股指

中证股票指数体系：上证+深证共同成立
1. 中证流通指数
2. 沪深300指数：沪市+深市按日均成交金额排序选取300只A股
3. 中证规模指数
4. 中证500指数：由全部A股中剔除沪深300指数成份股及总市值排名前300名的股票后，总市值排名靠前的500只股票组成，综合反映中国A股市场中一批中小市值公司的股票价格表现。
上证指数体系
1. 上证综指：全样本
2. 上证50：规模最大、流动性最好的50只
3. 上证180：总市值和成交金额靠前的180只
4. 上证380：成长性好、盈利能力强的新兴蓝筹股
深证指数体系
1. 深圳成分指数：深圳成指
2. 中小板综指：100家主要中小板股票
3. 创业板指数；创业板中100只
新华富时中国A50指数：新华财经与英国富时合资，包含A股市值最大的50家，满足国内/外（QFII）投资者实时可交易。

全球主要股指

美国
1. 道琼斯指数
2. 纳斯达克指数
3. 标准普尔指数
英国
1. 富时100
法国
1. 法国CAC40
德国
1. 德国DAX
日本
1. 日经225指数
中国香港
1. 恒生指数
韩国
1. 韩国综合指数
澳大利亚
1. 澳洲标普200
印度
1. 孟买SENSE
俄罗斯
1. 俄罗斯RTS
巴西
1. 圣保罗IBOV

4.4 项目融资

4.4.1 概念

项目融资（ProjectFinance）：是贷款人向特定的工程项目提供贷款协议融资，对于该项目所产生的现金流享有偿债请求权，并以该项目资产作为附属担保的融资类型。

4.4.2 种类

无追索权的项目融资

贷款的还本付息完全依靠经营效益。

有限追索权的项目融资

要求与项目有利害关系的第三方当事人提供各种担保。

贷款银行有权向担保方追索，以担保金额为限。

4.4.3 融资方式

政府主导方式

传统方式，本质是依靠政府负债。

利：运作简单，速度快，政府信用好。

弊：财政压力大、建设运营责任不清、资金利用效率低下。

BOT方式

Build Operate Transfer

政府特许授权投资公司去建设、运营，在一定期限（如：三十年）后，最终转让给政府。

以特许经营权为主。

利：市场竞争机制、减轻政府财政负担、提高项目运营效率、引入管理与技术；

弊：风险大，因为投资大、期限长、条件差异大、缺乏先例可循。

BT方式

Build Transfer

政府通过招投标，交给投资者去融资和建设，最后移交给政府。政府按协议分期支付项目投资与回报。

以项目外包为主。

弊：建设费用大、监管难、分包严重、质量得不到保证。

其他方式

BOOT: Build Own Operate Transfer
BOO: Build Own Operate
BTO: Build Transfer Operate
TOT: Transfer Operate Transfer

4.4.4 PPP项目融资模式

概念

Public-Private Partnership

政府与私人部门组成特许经营公司，引入社会资本。

政府补贴PPP项目，社会投资者投资PPP项目。

政府与私人部门风险共担，利益共享。

适用

投资规模达、需求长期稳定、价格调整机制灵活、市场化程度较高的基础设施及公共服务类项目。例如：地铁。

本质

债权上，将政府债务转化为企业债务；
运营上，引入市场竞争与激励机制，发挥各方优势，提高公共产品与服务的供应质量和效率。

PPP项目的资产证券化

运营管理权与收费收益权分离，将收益权作为基础资产。

按规定，PPP项目资产证券化的基础资产必须追到项目本身，不能以地方政府为基础资产，不能随意承诺保底、安排回购、明股实债等方式担保融资，但可以以财政补贴作为PPP项目收入的来源。

PPP项目期限一般为10~30年，比资产证券化产品期限（多数在7年以内）要长得多。

利：

盘活存量PPP项目资产：增强资金流动性与安全性；
吸引社会资本投入公共服务；
提升项目稳定运营能力：风险隔离。

类型：

使用者付费：经营性项目；
可行性缺口补助：准经营性项目，使用者付费不足以满足成本回收与合理回报时，政府提供缺口补助使项目可行；
政府付费：非经营性项目，如，垃圾处理、污水处理、市政道路。

5 证券的发行与交易

证券的发行市场成为一级市场，证券的交易市场称为二级市场。

5.1 证券的发行

5.1.1 证券发行市场

概念

又称“初级市场”、“一级市场”，是证券发行主体发行和推销新证券所形成的市场。

证券发行者->（中介机构）->投资者，期间受监管者（证监会）监管。

公募和私募

公募公开发行，经过严格审查，因此信用高；可公开交易，因此流动性好；但成本高；

私募不公开发行，不经严格审查，发行程序简单，因此成本低；不公开上市，因此流动性差。

直接发行和间接发行

直接发行：自营发行，发行者直接发售证券给投资者；

间接发行：承销发行，发行者委托承销商代为发售证券，承销商收取代理费，并承担发行责任与风险。

担保发行和无担保发行

主要用于债券发行。

担保发行：发行人以信用或实物担保，承诺证券收益；

无担保发行：不提供任何担保，例如：国债、部分金融债因违约可能性极低，一般无担保。

5.1.2 证券发行制度

注册制

以美国联邦证券法为代表，遵循公开原则，实质上是发行公司的财务公开制度。如果信息误导，投资者有权起诉。

证券主管机关对证券发行信息资料做审查，不禁止质量差、风险高的证券上市，由市场判断公司价值。

利：政府干预少；流程快；上市成本低。

弊：门槛高，只适用于发达成熟的市场（需要投资者理性，且发行者、承销商等机构恪守法律与职业道德）。

核准制

以欧洲各国公司法为代表，实行实质管理原则，发行者必须符合公司法规定的实质条件（经营性质、管理人员资格、资本结构、偿债能力等）。

适用于证券市场历史短、投资者素质不高的地区和国家（欧洲大陆、发展中国家）。

中国的证券发行制度历程

2001年前：发行审批制：地方与中央双重审批，获取配额，证监会复审；
2001年后：发行核准制；
1. 2001~2004：通道制，承销商（证券公司）推荐公司发行股票；
2. 2004年后：保荐制，承销商推荐并一定程度担保公司质量，保荐责任必须落实到个人。
2013年后：注册制提出；

发行制度对比

	中国大陆	中国香港	美国
发行上市制度	发审制转向核准制	高度市场化的核准制	注册制
审核时间	6个月	4个月	3-4个月
审核内容	实质审核	实质审核	形式审核
审核标准	监管部门严格审核资本结构、性质等	按《上市规则》规定指标	信息披露真实性

5.1.3 证券承销制度

承销商收承销费帮发行人销售股票和债券。

包销

全额包销：证券承销商全部购入，然后再转售给投资人；
余额包销：证券承销商按发行额，在发行期限内向投资人发售证券，到期未售出的证券由承销商负责认购。

代销

承销期结束时，将未售出的证券全部退还给发行人。

5.1.4 股票的发行

初次发行

三种情况

新建股份公司时发行股票；（设立发行）
原非股份制公司改制为股份制时发行股票；（设立发行）
原私人持股公司转为公众持股公司时发行股票；（首次公开发行，IPO）

IPO

Initial Public Offerings

IPO包含几个阶段，各阶段可按需并行：

计划筹备阶段
1. 寻求政府支持
2. 引入战略投资
3. 公司内部治理结构调整
4. 承销机构和其他中介早日入场
申报材料阶段
1. 审计报告和核准时间
2. 相关政府批文
3. 法律问题
4. 招股说明
发行审核阶段
1. 申报材料
2. 综合处收材料并分送预审员
3. 预审员审核并向企业提问
4. 形成反馈意见
5. 回复反馈意见
6. 通过预审会
7. 上发审会
8. 准备材料或退回材料
路演与询价阶段
1. 路演准备工作
2. 预路演：确定价格区间
3. 网下路演：面向网下机构投资者（调整价格）
4. 信息披露
5. 网上路演：面向散户投资者
上市阶段
1. 向交易所递交上市申请
2. 通过上市委员会审核
3. 刊登上市公告书
4. 上市仪式
5. 上市后市场维护
6. 持续督导

增资发行

SEO, Seasoned Equity Offerings

有偿增资发行

向原股东配股：准许老股东按一定的配股价格优先认购新股票；
向第三者配股：向股东以外的第三者（公司职工、公司往来客户、社会大众等）以新股认购权的方式配发新股。

无偿增资发行

原股东无需缴付股款即可获得新股。

通常目的是调整资本结构或将积累资本化。

形式有：

无偿交付：盈余公积转为股份；
红利增资：将分红改为股份；
股份分割：一股拆多股；
债券股票化：债券转为股票。

有偿无偿混合增资发行

按比例同时进行有偿和无偿增资。

股票发行的价格

影响因素：

企业自身状况
1. 经营业绩和发展前景
2. 净资产
3. 发行数量
宏观环境因素
1. 宏观政策
2. 所处行业
3. 股票流通市场

5.1.5 债券的发行

债券的评级

债券的发行需要评级

发行类型

定向发行：私募、面向特定投资者；
承购包销：商行、券商组成承销团；
招标发行：招标竞价确定发行价格；
直接发售：券商或银行柜台直接销售。

5.2 证券的交易

5.2.1 证券交易市场

概况

也称“二级市场”，是已发行的证券在证券市场上买卖的活动。

证券交易包含：

股票交易
债券交易
基金交易
金融衍生工具交易

交易所市场

证券交易所，会员资格才可交易，信息及时披露。二级市场中的第一市场。

会员制交易所

券商自愿组织的社会团体，会费共担，不以营利为目的。

会员既有交易权，也有交易所的所有权。

案例：上海证券交易所、深圳证券交易所

利：

不以营利为目的，费用低；
会员制避免违法行为；
损失由买卖双方自负，规范双方行为；
有政府支持，无破产可能；

弊：

缺乏第三方担保；
管理者同时也是交易者，有悖公平原则；
非会员不能进，容易垄断。

公司制交易所

商行、券商、信托等企业共同出资建立，以盈利为目的的公司法人。

案例：西方发达国家、中国香港都是公司制，20世纪90年代，为全球主要交易所采用。

交易权与所有权分离，会员无需拥有交易所所有权，也可拥有交易权。只有经过注册的券商才能进入交易大厅直接参加买卖。

利：

第三方担保，若会员违约造成损失，交易所负责赔偿；
管理权与所有权分离，交易者、管理者、所有者三方分离，交易所不偏袒任何一方；
服务优质：为盈利，不得不提供良好服务。

弊：

利益驱使，利润取决于交易额，滋长过度投机；
交易所是有限公司，不排除倒闭可能。

场外交易市场

定义

OTC, Over the Counter

在证券交易所以外，由证券买卖双方直接议价成交的市场。

特点

非集中：无交易场所、交易时间、交易规则限制；
开放式：无会员制；
种类多：上市或未上市证券都可交易；
议价方式不同：做市商和买卖价差的报价。

第二市场（柜台交易市场）

最早形成，公开但未上市发行的证券，如：地方债、市政债、公司债。

第三市场（大宗交易市场）

已上市证券在交易所以外进行交易的市场，因75年后允许交易所会员自行决定佣金，第三市场发展放缓。

节约交易所内大宗交易的高昂佣金。

第四市场（场外网络市场）

买卖双方不经过经纪人，而是通过网络直接大宗交易。

利：成本低；速度快；保密；不冲击证券市场。

弊：给金融监管带来挑战。

全球主要交易市场

美国证券市场

起源于政府债券。

纽约证券交易所

英国证券市场

随股份公司涌现和信用活动开展而发展。

伦敦证券交易所（前身自1773年）
利物浦证券交易所（1827年成立）
曼彻斯特证券交易所（1830年成立）

特点：

发行业务专业化，由各种证券金融业分担；
中小企业比重较大；
外国证券比重较大；
政府证券比重较大（伦敦证交所是最大的“金边债券”市场（早期英国政府公债带有黄边，且可靠性高，因此称为金边债券））。

中国证券市场

我国资本市场结构：

主板：也称一板市场，上市要求最高；
创业板：也称二板市场，深交所，上市要求适合中小企业；
新三板：全国中小企业股份转让系统；
区域性股权交易市场：省市级，仅用于地区内；
柜台市场：2012年证监会“限定私募、先行起步”，开展柜台试点。

5.2.2 证券交易机制

报价制度，做市商制度，主要用于柜台市场；

指令驱动制度，竞价方式，主要用于交易所。

做市商制度

做市商（market maker）制定买价（bid price）/卖价（askprice），在买卖双方中间赚取差价。

垄断型做市商

案例：纽约证券交易所

信息综合能力强，价格竞争性差，高额利润，易于监管

竞争型做市商

多元做市商制度，案例：纳斯达克交易所

竞争使市场活跃，交易量增加。做市商信息分散，无垄断地位，交易利润少。

指令驱动制度

竞价市场，买方订单和卖方订单通过经纪商进入市场，交易中心以买卖双向价格为基准进行撮合。

集合竞价

在一定时段内累积订单，到一定时刻再撮合定价。（通常是开市前10分钟）

连续竞价

在交易日各个时刻连续进行，只要存在匹配订单，交易即发生。

指令驱动交易过程

开户：投资者在经纪商处开户
1. 证券账户
2. 资金账户
委托：投资者委托经纪商下达买入卖出指令
1. 市价委托：按实施申报价格买卖证券；
2. 限价委托：设定买进价格上限或卖出价格下限；
3. 止损委托：市价低于卖方止损价即转为市价指令（卖方止损），市价高于买方止损价即转为市价指令（买方止损，通常期货市场买方避免过高价格）；
4. 止损限价委托：止损+限价。
竞价与成交：交易制度的核心，确定价格
1. 集合竞价：买单按价格降序排序，卖单按价格升序排序。撮合取得基准价。
2. 连续竞价：报一笔撮合一笔，不能成交则按“价格优先、同价则时间优先”原则排队。
结算：证券成交后，核定结算买卖双方应收应付的证券和价款
1. 逐笔交收：逐笔结算交易成本高，适合成交数少的大宗交易；
2. 净额交收：在买卖双方约定的交收期限内，以交易净额进行交收，如：上交所、深交所。
  1. 清算：
    1. 一级清算：券商之间以中央登记清算公司为中介做清算；
    2. 二级清算：券商与投资者之间的清算。
  2. 交割：买房付出现金取得证券，卖方交出证券获得价款。
    1. T+0：当日交割；
    2. T+1：次日交割（我国主要为T+1）；
    3. T+n：n日交割等。
过户：对股票和记名债券，还需要过户
1. 我国实现无纸化交易，无需再到发行公司办理过户手续。

混合交易制度

在做市商制度中引入竞价交易制度，如：1997年后的纳斯达克；
在竞价交易制度中引入做市商制度，如：1986年后的伦敦交易所。

交易机制对比

交易机制	做市商市场	竞价市场
竞争方式	报价驱动	指令驱动
价格发现	无正式程序	正式的市场开盘
监管	直接监管少，靠竞争改进缺陷	直接监管
竞争	做市商之间	客户之间
优点	成交及时；价格稳定；存货机制纠正买卖不均衡；做市商持仓抑制股价操纵	透明度高；信息传递快；运行费用低
缺点	因买卖集中在做市商手中而缺乏透明度；交易成本高；监管成本增加，难度大；	难以处理大宗交易；冷门股票成交持续萎缩；价格波动剧烈；价格难以维护，容易被操纵

5.3 量化投资

量化投资是利用计算机技术，采用数学模型实现投资理念、投资策略的过程。

5.3.1 算法交易

数学建模+计算机自动化（半自动化）交易。

目的

将大额交易分割为许多小额交易来应对市场风险和冲击。（避免大额交易被发现，使行情向不利于自己的方向发展）
因为大型交易者不会一次性暴露自己的所有交易指令，因此实际的交易机会很多，可以通过算法发现交易机会。

发展

2006年，欧美有三分之一的股票交易量由算法交易完成。

常用算法交易策略

算法交易核心问题：平衡冲击成本与等待风险。

交易太快，可以快速完成交易目标，减小等待风险，但会冲击市场，影响价格走势；
交易太慢，可以避免冲击市场，但无法快速完成交易，存在等待风险。

代表性的被动型算法交易策略：

VWAP

预测当天交易时间内各时间片的交易比例分布，最小化冲击成本。

标准VWAP：静态预测当天交易分布；

改进VWAP：根据市场价格走势调整交易量。

TWAP

不预测交易期内成交量的分布，按交易时段的长度加权。

PEG

盯住盘口策略，买入按当前最高买价，卖出按当前最低卖价发出限价交易。若交易未完成且成交价远离限价指令，则撤销，并重新循环。

IS策略

减小实际成交价与目标价的价差，分激进、中性和保守策略。

SOR策略

下单路径选优策略，从做市商、交易所、暗池等路径择优交易。

5.3.2 高频交易

利用高速计算机，在极短时间内判断有价值信息，先于其他投资者进行交易。例如：利用交易所之间的微小价差，大量地不停地买卖。

5.4 暗池交易

5.4.1 概念

买卖双方匿名配对进行大宗股票交易，主要为机构投资者，运作不透明。

机构投资者不希望公开寻找交易对手，而是希望避免市场冲击并保持信息保密（例如：防止被高频交易套利）。不借助公开交易市场，又存在搜寻成本高的问题。

5.4.2 分类

独立暗池

经纪公司组织，收取手续费，为机构投资者提供交易平台。

内部撮合池

证券经纪商组织，在内部对自营业务的订单与客户订单之间进行撮合，避免交易所交易的费用。

监听目标池

对冲基金和电子做市商组织，只接受/取消订单，根据发来的订单，决定是否交易。

联合暗池

多家金融机构共同组织，作为二级暗池，处理各家机构内部撮合池未能完成的订单余额的撮合。

5.4.3 特点

保密
撮合方式类似订单驱动的电子竞价市场

5.4.4 利弊

为客户保密，一方面保护知情交易者和大宗交易者的利益，另一方面妨碍外部投资者知情和市场定价效率。
流动性分离，一方面在暗池中为客户提供更多流动性，另一方面造成市场分割，与交易所争夺流动性。

6 投资公司

6.1 投资公司的类型

金融中介，本质特征是资产集合，汇集资金并投资证券。

6.1.1 单位投资信托

成立后，资产组合固定不变，是无需管理的基金；

主要投资固定收益资产组合。不需要主动管理，因此费率低。

6.1.2 投资管理公司

两类投资管理公司

开放型基金，即共同基金（mutualfund），随时可以赎回或发行股份；我国正式名称为“证券投资信托基金”。
封闭型基金，不能赎回或发行股份。

共同基金的运作

汇集大量投资者形成集合投资，基金的资金存于基金托管人，由基金管理人指令管理被托管资金，组合投资于一系列的证券。

6.2 基金的分类

6.2.1按申赎方式：开放式/封闭式基金

	封闭式基金	开放式基金
期限	5年以上，多数15年	无固定存续期
规模	不变	可变
价格	供求关系	净值
策略	无赎回、无准备金、可长期投资	有赎回、有准备金、无法全额用于长期投资
激励机制	缺乏	按总额收取管理费，若业绩差，则资金赎回流失

6.2.2按组织形式：契约型/公司型基金

	契约型基金	公司型基金
投资者地位	受益人，无发言权	既是受益人，也是股东（有发言权等股东权利）
资产运用依据	按契约	按公司章程
融资渠道	不能向银行借款	公司有法人资格，可以向银行借款
运营方式	按契约期运作	按公司法运作，除非破产清算，否则有永久性
资金性质	收益凭证	股票

中国基金以契约型为主，美国则以公司型居多。

6.2.3按投资对象：货币/股权/固定收益/混合/指数

6.2.4按投资目标：成长型/平衡型/收入型

6.2.5 按募集方式：公募/私募

私募基金：无需披露信息，监管不严，隐蔽。如美国的对冲基金，采取合伙制度。

6.3 基金的募集和交易

6.3.1 渠道

银行

基金公司官网

第三方理财平台

证券公司代销

6.3.2 交易原则

未知价：申赎时不知道资产成交价格；
按金额申购，按份额赎回

6.3.3 基金申赎

申购价格

申购价格=基金单位净值+前端费用

基金单位净值（net assetvalue）：（基金资产-基金负债）/已售出的基金单位
前端费用（front-end load）：申购费率

赎回金额

赎回金额=赎回总额-赎回费用

赎回数量×赎回日基金单位净值
后端费用（back-end value）=赎回总额×赎回费率

估值与费用

估值对象

资产：股票、债券、存款、应收利息；

负债：应收管理费、应付税收。

运作费用

管理费
托管费
交易费
其他：审计费、律师费、信息披露费

收益率

收益率=(净值增长+收入+资本利得)/初净值

6.3.4 基金的评级

晨星于1984年成立于美国芝加哥。晨星中国2003年在深圳成立。

分为定性与定量评价。

五个关键因素：投研团队、投资方法、基金公司、业绩、费用。

对基金只做同类比较。

每个月进行一次评级，只对三年及以上的基金进行星级评价。

6.4 代表性基金分类

6.4.1 指数型基金

按当期指数的成分股比例购买的基金，追踪指数的变化幅度。

1971年，世界上第一个指数基金出现在美国。

1994~1996年，市场上91%的股票基金收益增长率低于标普500指数，指数基金优势开始显现。

当市场越有效时，被动化管理越有优势。

特点：

费用低廉：被动管理，投研少、调仓少，因此成本低。
分散风险：单一股票涨跌不会冲击指数基金整体表现。
延迟纳税：在发达国家，资本利得税很少。
监控较少：管理人不需要监控基金表现。

6.4.2 ETF基金

交易型开放式指数证券投资基金（Exchange Traded Fund）

跟踪标的指数的变化，且在证交所上市交易的基金。

实物申赎

ETF对应的是一揽子股票，采用实物申赎，而不是一般开放基金的份额申赎。

实物申赎ETF必须以一篮子股票换取基金份额，或者以基金份额换取一篮子股票。

最小申赎单位都是100万基金份额，通常门槛高，由机构投资者直接申赎。

因为实物申赎可以当天换取为股票，而股票可以当天买卖，因此，ETF基金实物申赎可以做到T+0交割。

6.4.3 LOF基金

我国本土创新，上市型开放式基金（Listed Open-ended Fund）。

LOF基金是开放式与封闭式基金功能的结合体：

因为是上市的，所以可以在二级市场进行交易（类似封闭式基金）；
因为是开放的，所以可以按份额申赎（类似开放式基金）。

6.4.4 QDII基金

合格境内机构投资者（Qualified Domestic Institutional Investor）。

在国境内设立，经国内有关部门批准从事境外证券市场的股票、债券等有价证券业务的投资基金。

6.4.5 保本基金（避险策略基金）

在基金的保本周期内，投资者可以拿回认购时原始本金。

不表示周期内可以保本，也不保证周期内申购可以保本。保本周期在中国一般为3年，在国外可达7~12年。

一般使用利息或是极小比例的资产从事高风险投资，大多数资金投资于固定资产。

6.4.6 专户理财

类似私募基金，由基金公司对多个客户提供投资管理服务。

6.4.7 私募投资基金

Private placement fund

向特定的对象募集基金份额。

私募基金无需披露信息，监管要求不严，比较隐蔽。

我国规定所有基金必须公募，因此信托成为私募基金的合法化主要渠道。国内的阳光私募基金一般是私募信托证券基金，主要投资二级证券市场，与重点投资一级股权市场的私募股权基金（PE,Private Equity）在投资对象上有所区别。

私募基金一般为“2+20”收费模型，收取2%管理费和20%盈利部分提成。

6.4.8 对冲基金

Hedge Fund

借助复杂资产组合与风险管理手段，投资多种资产，广泛运用杠杆、卖空以及衍生品等交易策略。

美国的对冲基金即一般意义上的私募基金。

对冲基金以合伙人制度为主，仅提供给有限的合格投资者。

6.4.9 互联网宝类基金

本质上是货币基金。

现逐步向各类非标的资产投资发展，包括：土地质押、ABS债券、P2P小额信贷等。

7 投资收益与风险

7.1 利率水平的决定因素

名义利率 $R$，是资金量增长率；

真实利率 $r$，是购买力增长率。

记通货膨胀率为 $i$，则有以下关系公式： \[1+r = \frac{1+R}{1+i}\] 真实利率均衡（The Equilibrium Real Rate ofInterest）是指货币流通中，货币供给（居民储蓄）与货币需求（实体经济与投资）一致时的利率。影响货币供给和需求的因素（财政政策：财政盈余和财政赤字；货币政策：中央银行操作）就影响利率均衡。

名义利率均衡（The Equilibrium Nominal Rate ofInterest）是指当通货膨胀率增加时，投资者会对其投资提出更高的名义利率要求。

费雪公式代表预期通货膨胀率为 $E(i)$ 时，投资者的名义利率要求： \[R = r + E(i)\] 税收对真实利率的影响：税赋是基于名义收入的支出。假设税率为$t$，则税后真实利率为： \[R(1-t)-i\]

7.2 收益率的衡量

7.2.1 持有期收益率

持有期收益由两部分组成：

资本利得：投资买卖差价；
股息或红利：如，股票的股息、分红。

持有期收益率（Holding-periodReturn）是给定期限内的收益率。 \[r = HPR = \frac{p_t - p_0 + d}{p_0}\] 其中，$p_0$表示持有期起始时的价格，$p_t$表示持有期结束时的价格，$d$表示股息收入。

7.2.2 有效年利率

有效年利率（Effective AnnualRate）是一年期投资价值增长的百分比。

一年期总收入（$1+EAR$）是每一元投资的最终价值： \[1 + EAR = [1 + r_f(T)]^{1/T}\] 有效年利率$EAR$： \[EAR = (1 + r(T))^{\frac{1}{T}} - 1\] T年的总收益率$r(T)$： \[r(T) = (EAR + 1)^T - 1\] 投资期内总收益率$r(T)$与有效年利率$EAR$之间的关系： \[(EAR + 1)^T = r(T) + 1\]

7.2.3 年化百分比利率

年化百分比利率（Annual PercentageRate）是对期限小于一年的投资项目，将该投资的总收益率按照单利的形式转化为年收益率形式的利率。

年化百分比利率是年度化的简单利率： \[r(T) = T \times APR\] 例如：半年期国债总收益率$r(T) =1.63 \%$，其中$T =0.5$，则转化为年化百分比利率为$APR =1.63\% \times 2$。

总收益率、有效年利率、年化百分比利率之间的关系： \[EAR + 1 = [1+r(T)]^{\frac{1}{T}}=(1+T*APR)^{\frac{1}{T}}\] 有效年利率可以被年化百分比利率表达： \[EAR = (1 + T*APR)^{\frac{1}{T}} - 1\]

7.2.4 连续复利

连续复利（Continuously CompoundingInterest）是指在期数趋于无限大的极限情况下对应的利率，此时不同期之间的间隔很短，可以看作是无穷小量。

连续复利收益率可以简单理解为当投资期限为无穷小时的年化百分比收益率的值。

连续复利收益率又称为对数收益率（Log Return）。

7.3 风险与风险的衡量

7.3.1 风险的一般分类

风险一般可分为：

自然风险、政治风险、运输风险……
商界：财产权、生产、交易……
金融：
1. 投资风险
  1. 市场风险（Market risk）：Interest rate, currency, equity,commodity
  2. 信用风险（Credit risk）：sovereign, corporate, personal
  3. 流动性风险（Liquidity risk）：market, funding
  4. 运营风险（Operational risk）：system & control, managementfailure, human error
  5. 事件风险（Event risk）
2. 货币购买力风险：inflation, currency, liquidity

7.3.2 风险的衡量

期望收益是对收益的数学期望： \[E(r) = \sum_s p(s) r(s)\] 其中，$p(s)$是情境概率，$r(s)$是情境下的持有期收益率，$s$是情境。情境，例如：经济形势，好、差的经济形势，对应不同的收益率。

超额收益（ExcessReturn）是在任意一个特定的阶段，风险资产的实际收益率与实际无风险收益率的差值。

美国政府短期国库券（T-Bill）的收益率可被作为无风险收益率。相比之下，更长期的国债，尽管几乎不存在信用风险，但仍然存在货币购买力风险，如：通货膨胀风险。

风险溢价（RiskPremium）是风险资产预期持有期收益与无风险收益的差值。

收益波动性比率（The Reward-to-VolatilityRatio）有如夏普比率（Sharpe Ratio）： \[S = \frac{Risk Premium}{SD of Excess Return}\] 夏普比率计算的是风险溢价与超额收益标准差的比例。

7.4 收益率的时间序列分析

7.4.1 收益率

算术平均值收益率： \[E(r) = \sum_{s-1}^{n}p(s)r(s) = \frac{1}{n}\sum_{s-1}^{n}r(s)\]

几何平均值收益率： \[TV_n = (1+r_1)(1+r_2)...(1+r_n)\]

\[g = TV^{1/n} - 1\]

可以作为预期收益的估计工具。

实际上历史越久远对现有的预测影响越小，当以过去的时间序列估时计，需要考虑系统发生的变化。

理想情况下，拟合正态分布进行估计，只需依据过往的时间序列求解收益率的均值与方差。

现实中，小概率事件带来风险，收益可能偏离正态分布，此时标准差不再是衡量风险的完美度量工具，夏普比率也不再是评价证券表现的完美度量工具。对正态分布进行修正，需要考虑偏度（skewness）和峰度（kurtosis）。

Yahoo Finance可获取国内外的股票时序数据。

7.4.2 在险价值

spaCy无法下载预训练模型问题解决

2021-01-14T07:48:00.000Z

spaCy是一款非常好用的自然语言处理工具，不过也许是因为一些原因，无法正常下载spaCy官方的预训练模型了，网络连接被重置，报错ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))。为了解决该问题，可以尝试手动选择链接下载模型资源。

spaCy无法下载预训练模型问题解决

1 问题描述

在开展涉及自然语言处理的研究中，需要对自然语言数据进行一系列处理，因此需要使用spaCy。

不过近期发现，在使用命令下载spaCy的预训练模型时，会遭遇网络连接重置的问题，导致无法正常使用该工具。

具体报错如下：

(pytorch) shenjiayun@server3 ~/Dev/VisualEntailment $ python -m spacy download en_core_web_sm
Traceback (most recent call last):
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/connectionpool.py", line 706, in urlopen
    chunked=chunked,
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/connectionpool.py", line 382, in _make_request
    self._validate_conn(conn)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn
    conn.connect()
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/connection.py", line 421, in connect
    tls_in_tls=tls_in_tls,
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 429, in ssl_wrap_socket
    sock, context, tls_in_tls, server_hostname=server_hostname
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 472, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/ssl.py", line 423, in wrap_socket
    session=session
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/ssl.py", line 870, in _create
    self.do_handshake()
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/ssl.py", line 1139, in do_handshake
    self._sslobj.do_handshake()
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/connectionpool.py", line 756, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/util/retry.py", line 531, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/packages/six.py", line 734, in reraise
    raise value.with_traceback(tb)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/connectionpool.py", line 706, in urlopen
    chunked=chunked,
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/connectionpool.py", line 382, in _make_request
    self._validate_conn(conn)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn
    conn.connect()
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/connection.py", line 421, in connect
    tls_in_tls=tls_in_tls,
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 429, in ssl_wrap_socket
    sock, context, tls_in_tls, server_hostname=server_hostname
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 472, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/ssl.py", line 423, in wrap_socket
    session=session
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/ssl.py", line 870, in _create
    self.do_handshake()
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/ssl.py", line 1139, in do_handshake
    self._sslobj.do_handshake()
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/spacy/__main__.py", line 33, in 
    plac.call(commands[command], sys.argv[1:])
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/plac_core.py", line 348, in call
    cmd, result = parser.consume(arglist)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/plac_core.py", line 217, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/spacy/cli/download.py", line 44, in download
    shortcuts = get_json(about.__shortcuts__, "available shortcuts")
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/spacy/cli/download.py", line 95, in get_json
    r = requests.get(url)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

2 问题分析

不便分析。

3 问题解决

正常情况下，应该使用官方的命令来安装spaCy工具和最匹配的预训练模型。

1 2	pip install spacy python -m spacy download en_core_web_sm

但现在网络连接被重置，因此只能通过手动处理了。

spaCy在GitHub上同步存放了可下载的模型。

spaCymodels
This repository contains releasesof models for the spaCyNLP library. For more info on how to download, install and use themodels, see the modelsdocumentation.

在无法自动安装的情况下，可以手动选择安装指定的.tar.gz包，例如：

1
2
3

# pip install .tar.gz archive from path or URL
pip install /Users/you/en_core_web_sm-2.1.0.tar.gz
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.1.0/en_core_web_sm-2.1.0.tar.gz

采用此方法，意味着需要手动翻阅release中的资源，找出合适的预训练模型。例如，当前最新的en_core_web_sm模型是en_core_web_sm-2.3.1。

也可通过spaCy官方的链接确定合适的预训练模型版本，以en_core_web_sm为例：

English
Available pretrained statistical models for English

执行后可以成功下载和安装预训练模型：

(pytorch) shenjiayun@server3 ~/Dev/VisualEntailment $ pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz
Collecting https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz (12.0 MB)
     |████████████████████████████████| 12.0 MB 427 kB/s 
Requirement already satisfied: spacy<2.4.0,>=2.3.0 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from en-core-web-sm==2.3.1) (2.3.5)
Requirement already satisfied: thinc<7.5.0,>=7.4.1 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (7.4.5)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (4.55.0)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (2.25.1)
Requirement already satisfied: setuptools in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (51.0.0.post20201207)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (2.0.4)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (3.0.2)
Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (1.0.5)
Requirement already satisfied: plac<1.2.0,>=0.9.6 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (1.1.0)
Requirement already satisfied: numpy>=1.15.0 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (1.19.2)
Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (1.0.0)
Requirement already satisfied: blis<0.8.0,>=0.4.0 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (0.4.1)
Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (0.8.0)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (1.0.5)
Requirement already satisfied: importlib-metadata>=0.20 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from catalogue<1.1.0,>=0.0.7->spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (2.0.0)
Requirement already satisfied: zipp>=0.5 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from importlib-metadata>=0.20->catalogue<1.1.0,>=0.0.7->spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (3.4.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (1.26.2)
Requirement already satisfied: chardet<5,>=3.0.2 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (4.0.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (2020.12.5)
Requirement already satisfied: idna<3,>=2.5 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (2.10)
Building wheels for collected packages: en-core-web-sm
  Building wheel for en-core-web-sm (setup.py) ... done
  Created wheel for en-core-web-sm: filename=en_core_web_sm-2.3.1-py3-none-any.whl size=12047106 sha256=dd9f847a5f35d1760f70b07ab8e8a663ae2a10364a8d43ee93d2b0eab246de3d
  Stored in directory: /home/shenjiayun/.cache/pip/wheels/b7/0d/f0/7ecae8427c515065d75410989e15e5785dd3975fe06e795cd9
Successfully built en-core-web-sm
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-2.3.1

IPython无法自动补全且因jedi报TypeError而退出问题解决

2020-12-31T12:13:11.000Z

ipython7.19.0的自动补全失效，且回车后出现大段报错，提示jedi中TypeError: __init__() got an unexpected keyword argument 'column'。本文对问题进行排查并给出解决方案。

IPython无法自动补全且因jedi报TypeError而退出问题解决

1 问题描述

使用最新版的IPython7.19.0时，发现无法Tab自动补全，且回车后会出现报错，具体情境例如：

(pytorch) shenjiayun@server3 ~/Dev/VisualEntailment $ ipython
Python 3.7.9 (default, Aug 31 2020, 12:42:55) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import torch

In [2]: x = torch.rand([4, 36, 2048])
Traceback (most recent call last):
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/IPython/terminal/ptutils.py", line 113, in get_completions
    yield from self._get_completions(body, offset, cursor_position, self.ipy_completer)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/IPython/terminal/ptutils.py", line 129, in _get_completions
    for c in completions:
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/IPython/core/completer.py", line 438, in _deduplicate_completions
    completions = list(completions)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/IPython/core/completer.py", line 1818, in completions
    for c in self._completions(text, offset, _timeout=self.jedi_compute_type_timeout/1000):
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/IPython/core/completer.py", line 1862, in _completions
    full_text=full_text, cursor_line=cursor_line, cursor_pos=cursor_column)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/IPython/core/completer.py", line 2030, in _complete
    cursor_pos, cursor_line, full_text)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/IPython/core/completer.py", line 1374, in _jedi_matches
    text[:offset], namespaces, column=cursor_column, line=cursor_line + 1)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/jedi/api/__init__.py", line 726, in __init__
    project=Project(Path.cwd()), **kwds)
TypeError: __init__() got an unexpected keyword argument 'column'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/shenjiayun/miniconda3/envs/pytorch/bin/ipython", line 11, in 
    sys.exit(start_ipython())
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/IPython/__init__.py", line 126, in start_ipython
    return launch_new_instance(argv=argv, **kwargs)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/traitlets/config/application.py", line 845, in launch_instance
    app.start()
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/IPython/terminal/ipapp.py", line 356, in start
    self.shell.mainloop()
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/IPython/terminal/interactiveshell.py", line 564, in mainloop
    self.interact()
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/IPython/terminal/interactiveshell.py", line 547, in interact
    code = self.prompt_for_code()
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/IPython/terminal/interactiveshell.py", line 475, in prompt_for_code
    **self._extra_prompt_options())
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/prompt_toolkit/shortcuts/prompt.py", line 1013, in prompt
    return self.app.run(set_exception_handler=set_exception_handler)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/prompt_toolkit/application/application.py", line 817, in run
    self.run_async(pre_run=pre_run, set_exception_handler=set_exception_handler)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
    return future.result()
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/prompt_toolkit/application/application.py", line 783, in run_async
    return await _run_async2()
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/prompt_toolkit/application/application.py", line 771, in _run_async2
    await self.cancel_and_wait_for_background_tasks()
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/prompt_toolkit/application/application.py", line 872, in cancel_and_wait_for_background_tasks
    await task
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/prompt_toolkit/buffer.py", line 1854, in new_coroutine
    await coroutine(*a, **kw)
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/prompt_toolkit/buffer.py", line 1684, in async_completer
    document, complete_event
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/prompt_toolkit/completion/base.py", line 270, in get_completions_async
    document, complete_event
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/prompt_toolkit/completion/base.py", line 196, in get_completions_async
    for item in self.get_completions(document, complete_event):
  File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/IPython/terminal/ptutils.py", line 116, in get_completions
    exc_type, exc_value, exc_tb = sys.exc_info()
NameError: name 'sys' is not defined

If you suspect this is an IPython 7.19.0 bug, please report it at:
    https://github.com/ipython/ipython/issues
or send an email to the mailing list at ipython-dev@python.org

You can print a more detailed traceback right now with "%tb", or use "%debug"
to interactively debug it.

Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
    %config Application.verbose_crash=True

2 问题排查

根据常识推断，ipython使用的是jedi作为languagesever实现代码自动补全。当自动补全失灵的时候，那应该和jedi有关。

从长段的报错中，能看到错误定位于jedi/api/__init__.py", line 726, in __init__，提示TypeError: __init__() got an unexpected keyword argument 'column'。确实是调用jedi时出现了错误。

由此查jedi的开源项目，发现issue：

IPython(<=7.19) incompatible with jedi 0.18.0 #1714

Relevant traceback reads as follows:

  File "../venv/lib/python3.8/site-packages/IPython/core/completer.py", line 2029, in _complete
    completions = self._jedi_matches(
  File "../venv/lib/python3.8/site-packages/IPython/core/completer.py", line 1373, in _jedi_matches
    interpreter = jedi.Interpreter(
  File "../venv/lib/python3.8/site-packages/jedi/api/__init__.py", line 725, in __init__
    super().__init__(code, environment=environment,
TypeError: __init__() got an unexpected keyword argument 'column'

经过确认，jedi所有者表示该问题系ipython作为下游应用的调用问题，待下游应用更新解决。

davidhalter commented6days ago
I think we should continue the discussion in ipython/ipython#12740.IMOthis is an downstream issue and they should just do a new release.

3 解决方案

既然ipython目前最新的7.19.0版本无法正确调用最新的jedi0.18.0版本，那就把jedi版本降级到0.17即可。

通过conda检索可用的jedi版本：

1	conda search jedi

可见：

(pytorch) C:\Users\jyshen>conda search jedi
Loading channels: done
# Name                       Version           Build  Channel
jedi                           0.8.1          py26_0  pkgs/free
jedi                           0.8.1          py27_0  pkgs/free
jedi                           0.8.1          py33_0  pkgs/free
jedi                           0.8.1          py34_0  pkgs/free
jedi                           0.9.0          py26_0  pkgs/free
jedi                           0.9.0          py27_0  pkgs/free
jedi                           0.9.0          py27_1  pkgs/free
jedi                           0.9.0          py33_0  pkgs/free
jedi                           0.9.0          py34_0  pkgs/free
jedi                           0.9.0          py34_1  pkgs/free
jedi                           0.9.0          py35_0  pkgs/free
jedi                           0.9.0          py35_1  pkgs/free
jedi                           0.9.0          py36_1  pkgs/free
jedi                          0.10.2          py27_0  pkgs/free
jedi                          0.10.2          py27_2  pkgs/free
jedi                          0.10.2  py27h4f12af3_0  pkgs/main
jedi                          0.10.2          py35_0  pkgs/free
jedi                          0.10.2          py35_2  pkgs/free
jedi                          0.10.2  py35h3350e2d_0  pkgs/main
jedi                          0.10.2          py36_0  pkgs/free
jedi                          0.10.2          py36_2  pkgs/free
jedi                          0.10.2  py36hed927a0_0  pkgs/main
jedi                          0.11.0          py27_1  pkgs/main
jedi                          0.11.0          py27_2  pkgs/main
jedi                          0.11.0  py27h53c0d9b_0  pkgs/main
jedi                          0.11.0          py35_1  pkgs/main
jedi                          0.11.0          py35_2  pkgs/main
jedi                          0.11.0  py35hc856aec_0  pkgs/main
jedi                          0.11.0          py36_1  pkgs/main
jedi                          0.11.0          py36_2  pkgs/main
jedi                          0.11.0  py36hc338079_0  pkgs/main
jedi                          0.11.1          py27_0  pkgs/main
jedi                          0.11.1          py27_1  pkgs/main
jedi                          0.11.1          py35_0  pkgs/main
jedi                          0.11.1          py35_1  pkgs/main
jedi                          0.11.1          py36_0  pkgs/main
jedi                          0.11.1          py36_1  pkgs/main
jedi                          0.12.0          py27_0  pkgs/main
jedi                          0.12.0          py27_1  pkgs/main
jedi                          0.12.0          py35_0  pkgs/main
jedi                          0.12.0          py35_1  pkgs/main
jedi                          0.12.0          py36_0  pkgs/main
jedi                          0.12.0          py36_1  pkgs/main
jedi                          0.12.0          py37_1  pkgs/main
jedi                          0.12.1          py27_0  pkgs/main
jedi                          0.12.1          py35_0  pkgs/main
jedi                          0.12.1          py36_0  pkgs/main
jedi                          0.12.1          py37_0  pkgs/main
jedi                          0.13.1          py27_0  pkgs/main
jedi                          0.13.1          py36_0  pkgs/main
jedi                          0.13.1          py37_0  pkgs/main
jedi                          0.13.2          py27_0  pkgs/main
jedi                          0.13.2          py36_0  pkgs/main
jedi                          0.13.2          py37_0  pkgs/main
jedi                          0.13.3          py27_0  pkgs/main
jedi                          0.13.3          py36_0  pkgs/main
jedi                          0.13.3          py37_0  pkgs/main
jedi                          0.14.1          py27_0  pkgs/main
jedi                          0.14.1          py36_0  pkgs/main
jedi                          0.14.1          py37_0  pkgs/main
jedi                          0.14.1          py38_0  pkgs/main
jedi                          0.15.1          py27_0  pkgs/main
jedi                          0.15.1          py36_0  pkgs/main
jedi                          0.15.1          py37_0  pkgs/main
jedi                          0.15.1          py38_0  pkgs/main
jedi                          0.15.2          py27_0  pkgs/main
jedi                          0.15.2          py36_0  pkgs/main
jedi                          0.15.2          py37_0  pkgs/main
jedi                          0.15.2          py38_0  pkgs/main
jedi                          0.16.0          py36_0  pkgs/main
jedi                          0.16.0          py36_1  pkgs/main
jedi                          0.16.0          py37_0  pkgs/main
jedi                          0.16.0          py37_1  pkgs/main
jedi                          0.16.0          py38_0  pkgs/main
jedi                          0.16.0          py38_1  pkgs/main
jedi                          0.17.0          py36_0  pkgs/main
jedi                          0.17.0          py37_0  pkgs/main
jedi                          0.17.0          py38_0  pkgs/main
jedi                          0.17.1          py36_0  pkgs/main
jedi                          0.17.1          py37_0  pkgs/main
jedi                          0.17.1          py38_0  pkgs/main
jedi                          0.17.2          py36_0  pkgs/main
jedi                          0.17.2  py36haa95532_1  pkgs/main
jedi                          0.17.2          py37_0  pkgs/main
jedi                          0.17.2  py37haa95532_1  pkgs/main
jedi                          0.17.2          py38_0  pkgs/main
jedi                          0.17.2  py38haa95532_1  pkgs/main
jedi                          0.17.2  py39haa95532_1  pkgs/main
jedi                          0.18.0  py36haa95532_0  pkgs/main
jedi                          0.18.0  py37haa95532_0  pkgs/main
jedi                          0.18.0  py38haa95532_0  pkgs/main
jedi                          0.18.0  py39haa95532_0  pkgs/main

通过conda安装指定版本的jedi：

1	conda install jedi=0.17

再次测试ipython，不再出现该问题。

云原生与微服务概念笔记

2020-12-28T09:16:14.000Z

云原生与微服务概念入门。

云原生与微服务概念笔记

参考书籍：

朱荣鑫，黄迪璇，张天. Go语言高并发与微服务实战[M].北京：中国铁道出版社有限公司，2020.

1 云原生架构

1.1 云计算的历史

1.1.1 云计算的基础：虚拟化技术

虚拟化是云计算的基石。

1955年，MIT的JohnMcCarthy提出time-sharing技术。
1959年6月，Christopher Strachey在国际信息处理大会发表《Time Sharingin Large Fast Computer》论文，提出虚拟化概念。
1965年8月，IBM推出TSS(Time Sharing System)和VMM(Virtual MachineMonitor)，是最原始的虚拟机技术。
20世纪60年代中期，美科学家HCR Licklider提出计算机互联系统，BobTaylor和Larry Robert开发ARPANET。
1978年，IBM获得RAID专利，融物理设备为LUN(LogicalUnit Number)，首次将虚拟化引入存储。
1990，Unity Computing概念复苏，亦称GridComputing，旨在实现公共计算服务给全世界用户使用。
1998年，VMware成立并首次引入x86虚拟化技术。
2000年，IEEE颁布VPN(Virtual PrivateNetwork)标准草案。
2002年，Amazon上线AWS(Amazon.com WebService)，旨在商品目录以SOAP接口开放。
2005年，开源虚拟机Xen 3.0发布，支持Intel VT和IA64。
2006年10月，以色列创业公司Qumranet宣布KVM诞生，且KVM模块的源码成为Linux内核源码的一部分。
2009年4月，VMware推出首款云操作系统VMware vSphere。

1.1.2 基于虚拟机的云计算

虚拟化技术成熟，云计算市场出现。

2006年，AWS推出S3(Simple Storage Service)和EC2(Elastic ComputeCloud)。
2007年，IBM发布云计算商业解决方案，推出Blue Cloud计划。
2008年，Google App Engine发布，用于Web开发和托管。
2009年，Heroku推出首款公有云PaaS(Platform as a Service)。
2010年，微软推出Azure。
2010年，Rackspace Hosting和NASA推出OpenStack开源云软件计划。
2011年，Pivotal推出开源PaaS——Cloud Foundry。
2013年，Docker发布，其使用LXC并封装一些新功能。
2014年，AWS推出Lambda，在AWS中直接运行代码而无需考虑服务器配置和管理，即FaaS(Functionas a Service)、Serverless。

云计算模式：

IaaS：Infrastructure as a Service，提供基础资源。
SaaS：Software as aService，提供搭建、实施、维护等一系列的软件服务。拿来即用。
PaaS：Platform as aService，是SaaS的延申，抽象硬件和操作系统，对外提供运行时环境作为部署平台，便于扩展。

类型	传统IT	IaaS	PaaS	SaaS
应用程序	×	×	×	√
数据	×	×	×	√
运行时	×	×	√	√
中间件	×	×	√	√
操作系统	×	×	√	√
虚拟化	×	√	√	√
服务器	×	√	√	√
存储	×	√	√	√
网络	×	√	√	√

×表示云计算厂商不负责，√表示云计算厂商负责。

1.1.3 容器化和容器编排

容器化本质上是虚拟化的改进。

虚拟化通过Hypervisor分离操作系统，容器化共享操作系统。

LXC(LinuxContainer)侧重容器运行环境的资源隔离和限制，类似进程沙箱，而没有容器镜像打包技术，所以没有普及。

Docker在LXC的基础之上，建立了一套镜像打包和运行机制，将应用程序和依赖项打包成镜像文件，换别的Docker中也能运行，实现Build,Ship and Run。

容器编排技术经过Mesos、Swarm和Kubernetes三家竞争，最后随着Kubernetes的成熟及其与Docker的融合，PaaS技术的主流路线过渡到了KubernetesDocker。2018年，Kubernetes占据统治地位。

1.1.4 云计算演进总结

企业降低对IT基础设施的直接投入，而是通过上云来获取计算和存储能力，按时按需计费。

云计算降低了IT支出，降低了行业技术壁垒。

1.2 云原生是什么

1.2.1 云原生出现的背景

移动互联网，业务高速发展，快速迭代。

1.2.2 云原生的定义

Pivotal（云原生应用提出者）：

DevOps
持续集成
微服务架构
容器化

CNCF(Cloud Native Computing Foundation)：

应用容器化
面向微服务架构
应用支持容器的编排调度

Missionof the Cloud Native Computing Foundation
The Foundation’s mission is to make cloud native computingubiquitous. The CNCF Cloud Native Definition v1.0 says:
Cloud native technologies empower organizations to build andrun scalable applications in modern, dynamic environments such aspublic, private, and hybrid clouds. Containers, service meshes,microservices, immutable infrastructure, and declarative APIs exemplifythis approach.
These techniques enable loosely coupled systems that are resilient,manageable, and observable. Combined with robust automation, they allowengineers to make high-impact changes frequently and predictably withminimal toil.
The Cloud Native Computing Foundation seeks to drive adoption of thisparadigm by fostering and sustaining an ecosystem of open source,vendor-neutral projects. We democratize state-of-the-art patterns tomake these innovations accessible for everyone.

1.2.3 云原生与12因素

2012年，Heroku提出12-Factors云应用设计理念。

Codebase：基准代码。用一个代码库做版本控制和多次部署。
Dependencies：依赖。显式声明依赖关系，通过工具（Maven, Bundler,NPM等）隔离依赖，目的是不依赖于部署环境。
Config：配置。通过操作系统级的环境变量将配置信息应用到各个部署环境。
Backing services：后端服务。视后端服务为附加资源。
Build, release, run：严格分离构建和运行。
Process：进程。应用程序作为一个或多个无状态进程执行。任何持久化数据都存储于后端服务。
Portbinding：端口绑定。完全自我加载，不依赖网络服务器即可提供网络服务。通过监听端口来服务发来的请求。
Concurrency：并发。通过进程模型进行扩展。水平向外扩展应用进程。
Disposability：易处理。快速启动、优雅终止可最大化健壮性。包括，快速而有弹性的扩展、对变更的部署和宕机恢复能力。
Dev/prodparity：开发环境与线上环境等价。尽可能保持开发、预发布和线上环境的相似，实现持续交付与部署。
Logs：日志。视日志为事件流，通过集中式服务收集、聚合、检索和分析日志。
Adminprocesses：管理进程。后台管理任务当作一次性进程执行，如：数据库迁移任务。

核心思想：

使用声明式格式来搭建自动化。（学习成本低）
和底层操作系统保持简洁的契约。（可移植性强）
适合在现代的云平台上部署。（避免额外的管理需求）
最小化开发与生产的分歧。（持续部署、灵活性强）
在工具、架构和开发实践不产生重大变化的前提下实现扩展。

1.3 云原生的基础架构

云原生应用利用微服务、服务网络、容器、DevOps和声明式API等代表性技术，来构建容错性好、易于管理和便于观察的松耦合系统。

1.3.1 微服务

将明确定义的功能分成更小的服务，服务之间是松耦合的，每个服务可以独立迭代。

优点：降低系统复杂度、独立部署、独立扩展、跨语言编程。

缺点：需要构建、测试、部署、运行数十个独立的服务，支持多种语言和环境，还引入了分布式系统的复杂性，如：网络延迟、容错性、消息序列化、不可靠网络、异步机制、版本化和差异化。

1.3.2 容器

将微服务和所需的所有配置、依赖关系和环境变量打包成容器镜像，轻松移植到新的服务器节点。

人力运维部署成本太大，在Docker基础之上，引入Kubernetes可以实现容器集群的自动化部署、自动扩缩容和维护等功能。

Kubernetes不仅支持Docker，还支持Rocket等其他容器技术。

1.3.3 服务网络

微服务技术架构有：

侵入式架构：服务框架嵌入程序代码，开发者组合各种组件（如：RPC、负载均衡、熔断等）。
非侵入式架构：以代理的形式，与应用程序部署在一起，接管应用程序网络并对其透明，开发者只关心自身业务。

服务网络（Service Mesh）对运行于其上的云原生应用是透明的。

服务网格是处理服务间通信的基础设施层。它负责构成现代云原生应用程序的复杂服务拓扑来可靠地交付请求。在实践中，ServiceMesh通常以轻量级网络代理阵列的形式实现，这些代理与应用程序代码部署在一起，对应用程序来说无需感知代理的存在。

开源的服务网络软件：Istio、Linkerd、Envoy、Dubbo Mesh等。ServiceMesh可以运行在Kubernetes上。

1.3.4 DevOps

DevOps包含三个部分：

开发
测试
运维

DevOps

Dev
1. Plan
2. Create
3. Verify
4. Package
Ops
1. Release
2. Configure
3. Monitor

1.4 小结

云原生将云目标从节约IT成本转向推动业务增长。

2 微服务概述

2.1 系统架构的演进

2.1.1 单体架构

巨石（Monolith）应用，易于测试、部署，但编译慢、局部改动就要重新部署、技术难扩展。

2.1.2 垂直分层架构

对单体架构垂直拆封，例如：用户界面层、业务逻辑层、数据访问层。

2.1.3面向服务架构SOA(Service-Oriented Architecture)

每个服务登记到服务登记中心上。

服务消费者从服务登记中心寻找，通过发送消息由企业服务总线（EnterpriseService Bus）转换后发送给相应的服务来调用服务。

SOA是中心化架构，关注系统集成。

2.1.4 微服务架构

大型复杂软件有一个或多个微服务组成。微服务可独立部署、松耦合、仅关注完成单一职责。每个职责代表一个高内聚的业务能力。

微服务是去中心化架构，关注分散管理、代码重用、快速扩展。

微服务架构的特点：

系统服务曾分离为一个个的微服务。
微服务遵循单一原则。
微服务之间采用RESTful等轻量级协议通信。
微服务采用容器技术部署，运行在自己的独立进程中。
每个微服务都有独立的业务开发活动和周期。

如果拆分的服务过多，服务治理成本会极大升高，开发调试成本高。服务之间相互依赖，还可能形成复杂依赖链，异常时出现雪崩效应。

2.1.5 云原生架构

代表技术：

容器
服务网络
微服务
不可变基础设施
声明式API

四要素：

微服务
容器化
DevOps
持续交付

云原生架构依托PaaS产品：

Codeless：服务开发。
Applicationless：服务发布。
Serverless：服务运维。

2.2 常见的微服务框架

2.2.1 Java中的SpringCloud与Dubbo框架

Spring Cloud将各家公司开发的比较成熟的服务框架组合起来，通过SprIngBoot风格再封装，屏蔽复杂配置和实现原理，对外提供简单易懂的工具包。

Dubbo框架是分布式服务框架，提供RPC方案和SOA服务治理方案，特点主要在：远程通信、集群容错、自动发现。

2.2.2 Go语言中的Go Kit与GoMicro框架

Go-kit（gokit.io）是Go语言工具包的集合。

Go-kit不仅是微服务工具包，也非常适合构建优雅的架构设计。

Go-kit应用程序架构：

传输层：网络通信，HTTP、gRPC等，或NATS发布订阅系统。
接口层：服务对外提供的接口方法定义为端点（Endpoint），端点使用传输层的通信对外提供服务。
服务层：业务逻辑，不考虑传输、编解码。

Go Micro是Go语言实现的插件化RPC微服务框架，包含组件：

Registry：服务发现，解析服务名字到服务地址。
Selector：基于Registry的负载均衡组件。
Broker：发布和订阅组件。服务之间基于消息中间件的异步通信。
Transport：服务之间的同步通信。
Codec：服务之间的消息编解码组件。
Server：服务主体。
Client：提供访问微服务的客户端。

2.3 微服务设计的六大原则

高内聚低耦合
高度自治
以业务为中心
弹性设计
日志与监控
自动化

2.4 领域驱动设计

Domain Driven Design

分为4层：

Interface
Application
Domain
Infrastructure

业务系统

核心域（Core Domain）：如，秒杀操作。
子域
- 支撑子域（GenericSubdomain）：如，活动管理领域（创建秒杀、查询秒杀）。
- 通用子域（Common Subdomain）：如，用户鉴权领域。

限界上下文和子域一一对应，一个限界上下文只使用一套通用语言，并保证其清晰简洁。

实际情况中，根据业务，有时将多个界限上下文合并。

随着微服务架构流行，组织内部产生许多小规模团队。组织架构从层级职能组织变成扁平的小团队集群。

《远见》读书笔记

2020-12-23T05:21:39.000Z

从图书馆借阅了[加]布莱恩·费瑟斯通豪（BrainFetherstonhaugh）所著的《远见》（The LongView）一书，记录下核心思想。

《远见》读书笔记

分阶段规划

职业生涯不是短跑比赛，职业生涯的持续时间长的惊人，可分三个阶段：

第一阶段：
1. 职业生涯的前15年
2. 目标：为接下来的两个阶段打好基础
3. 策略：加添燃料，强势开局
第二阶段：
1. 职业生涯中段的15年
2. 目标：在长板、爱好与这个世界的需求之间寻找交集，想方设法脱颖而出。
3. 策略：锚定甜蜜去，聚焦长板
第三阶段：
1. 职业生涯的最后几年
2. 目标：确定接班人，完成继任计划，完成角色转变，成为顾问、辅助者等。
3. 策略：优化长尾，发挥持续影响力

储备职场燃料

成功的可持续职业生涯是靠职场燃料推动的。

积累、不断更新并精明地消费职场燃料。

基本地职场燃料：

可迁移的技能
1. 解决问题的能力
2. 说服式沟通技巧
3. 完成任务的能力
4. ”人才引力“
5. 帮助和求助的能力
6. 情商
有意义的经验
1. 多样性经验，建立新的职业技能；
2. 在不同的环境中尝试不同的事情、试验不同的做事方法，能创造出更强的决策技能。
持久的关系，即职业生态系统
1. 上司
2. 客户
3. 商业伙伴
4. 身边的人才
5. 你的同类

职场思维

职业生涯的长度：到退休的年数。
精通一项技能所需的时间：至少需要10000小时的密集训练和联系。
40岁之后能赚到的个人财富百分比：85%~90%，大多数人的财富积累要蓄力到40岁、50岁甚至60岁才爆发出来。
社交网络好友：并不是越多越好。
职场支持者人数：找到3~5个真正能称为导师的人。

步入职场策略

利用在读的时间储备早期形式的职场燃料。
制订求职作战计划。
积极参与校园招聘。
高效地进行在线申请。
用好你的关系。
与联系人见面之前，做些功课。
做好心理准备，找到第一份工作难于上青天。
不断探索。

初任管理者的建议

时刻注意你的易容、态度和举止。
简洁地表达你的愿景，并且不停地重复。
尽快选好团队成员。
每一个有意义地商业问题最好能在较小的团队中解决。
表现得像个被人信赖的解答者。
你并不需要无所不知，而是应该多多找人咨询。

首席执行官的特质

诚实，与公司的文化契合度。
智力上的好奇和敏捷。
有提升业务业绩的经验。
真实、自我意识以及平衡。
活力和热情。

合理规划第三阶段的建议

试验，自愿接受挑战。
创业，开辟全新疆域。
管理学习曲线，保持关联性。

职业生涯与为人父母共存之道

不要让职业生涯和为人父母成为非此即彼的选项。
找到一个热爱家庭的雇主。
找到后方的恰当支持。
设立现实的期望和严格的界限。
管理你的时间和精力。

回归正轨之法

重新组织你的经验。
重新包装你的技能。
重新连接职业生态系统。
重新建立自信。

其他

面对机制的竞争

明智之举是培养情商、创造力、协作能力和建立信任关系的技能。

在哪里找工作

像领英这样的在线平台将成为公司寻找人才、个人寻找工作的主要场所。

将时间投资在哪里

创业和自由职业将在不久的将来蓬勃发展，工作目标也将更多样化。

怎样保持收入稳定

退休并不代表就能安享晚年，继续工作才能获得稳定的收入。

享受工作的快乐

想在工作中更快乐，就需要提高幸福感。

DistributedDataParallel(DDP) - PyTorch多进程并行计算

2020-12-17T11:59:51.000Z

PyTorch的DistributedDataParallel(DDP)可以实现多进程的并行计算，相较于传统的单进程多线程的DataParallel，DDP支持多节点的分布式计算。即使在单机多卡的场景下，DDP通常性能也更好，因为它不仅规避了Python多线程的全局解释器锁争用（GILcontention）造成的性能开销，而且还不需要在多GPU训练中频繁复制同步模型、分发输入数据和收集模型输出。

DistributedDataParallel(DDP)- PyTorch多进程并行计算

背景

Python GIL

GIL（Global InterpreterLock）指的是全局解释器锁，由CPython解释器引入。因为CPython解释器的内存管理是线程不安全的，所以为了避免多线程同时执行Python字节码造成线程安全问题，就加了这么一个全局的互斥锁。可也正是因为这个全局互斥锁，导致Python的多线程实际上同时只有一个线程在运行，显然无法充分利用多处理器的性能。

参考官方解释：

ThePython GIL
Python has one peculiarity that makes concurrent programming harder.It’s called the GIL, short for Global Interpreter Lock.The GIL makes sure there is, at any time, only one thread running.Because only one thread can run at a time, it’s impossible to usemultiple processors with threads. But don’t worry, there’s a way aroundthis.
The GIL was invented because CPython’s memory management is notthread-safe. With only one thread running at a time, CPython can restassured there will never be race conditions.

DistributedDataParallel较DataParallel的优势

DistributedDataParallel(DDP)相较于DataParallel(DP)有诸多优势，包括功能上的优势和性能上的优势：

功能上：

DDP的原理是多进程，因此DDP支持多机多卡的分布式计算，而DP是但经常多线程，因此最高只支持单机多卡；
DDP支持模型并行（modelparallel），可以把一个模型拆成几个阶段来跑，而DP还不支持。

性能上：

正是因为DDP基于多进程（通常推荐1个GPU匹配一个工作进程），所以不像DP那样基于单进程多线程的并行性能受到GIL争用开销的阻碍。
在单机多卡的情况下，DP需要在训练中频繁在多卡之间复制模型以完成同步，需要分发（scatter）输入和收集（gather）输出，而DDP采用的All-Reduce算法采取聚合通信（collectivecommunication）的方式收集梯度，其性能更好。

总的来讲，功能上的优势其实也是为了更好利用设备性能。

原理

原理可参阅：

DistributedData Parallel - PyTorch master Documentation

另有一篇2020年的论文：

Li S, Zhao Y, Varma R, et al. PyTorch distributed: experiences onaccelerating data parallel training[J]. Proceedings of the VLDBEndowment, 2020, 13(12): 3005-3018.
PDF onvldb.org

总的架构可以参考：

Distributed System
- Node 0
  - Process0 [Global Rank=0, Local Rank=0] -> GPU 0-0
  - Process1 [Global Rank=1, Local Rank=1] -> GPU 0-1
  - Process2 [Global Rank=2, Local Rank=2] -> GPU 0-2
  - Process3 [Global Rank=3, Local Rank=3] -> GPU 0-3
- Node 1
  - Process4 [Global Rank=4, Local Rank=0] -> GPU 1-0
  - Process5 [Global Rank=5, Local Rank=1] -> GPU 1-1
  - Process6 [Global Rank=6, Local Rank=2] -> GPU 1-2
  - Process7 [Global Rank=7, Local Rank=3] -> GPU 1-3

在这样的架构中，有如下术语和数值：

N=2 Nodes
G=4 GPUs per node
W=8 Application processes across all nodes (aka. WorldSize)
L=4 Application processes on each nodes (aka. LocalSize)

使用

相较于DataParallel只需要简单地套到原模型上，DistributedDataParallel因为其原理是基于多进程的，因此写起来会稍微显得复杂一点点。

"""
Distributed Data Parallel (DDP) example

Author: HearyShen
Date:   2020.12.17
"""
import time
import os
import random
from argparse import ArgumentParser

import torch
import torch.utils.data as data
import torch.nn as nn
import torch.cuda as cuda
import torch.distributed as dist
import torch.multiprocessing as mp
import torch.optim as optim
import torchvision.models as models

DIST_DEFAULT_BACKEND = 'nccl'
DIST_DEFAULT_ADDR = 'localhost'
DIST_DEFAULT_PORT = '12344'
DIST_DEFAULT_INIT_METHOD = f'tcp://{DIST_DEFAULT_ADDR}:{DIST_DEFAULT_PORT}'
DIST_DEFAULT_WORLD_SIZE = cuda.device_count()

DEFAULT_BATCH_SIZE = 64
DEFAULT_NUM_WORKERS_PER_GPU = 8


class TinyNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.mlp = nn.Linear(3, 2)

    def forward(self, x):
        out = self.mlp(x)
        return out


class TinyDataset(data.dataset.Dataset):
    def __getitem__(self, index):
        x = torch.rand([3, 224, 224])
        y = random.randint(0, 999)
        return x, y

    def __len__(self):
        return 10000


def worker(rank, args):
    model = models.resnet50(pretrained=True)
    if args.distributed:
        print(
            f"[{os.getpid()}] Initializing {rank}/{DIST_DEFAULT_WORLD_SIZE} at {DIST_DEFAULT_INIT_METHOD}"
        )

        # initialize with TCP in this example
        dist.init_process_group(backend=DIST_DEFAULT_BACKEND,
                                init_method=DIST_DEFAULT_INIT_METHOD,
                                world_size=DIST_DEFAULT_WORLD_SIZE,
                                rank=rank)

        # # Another way to initialize with environment variables
        # os.environ["MASTER_PORT"] = DIST_DEFAULT_PORT
        # os.environ["MASTER_ADDR"] = DIST_DEFAULT_ADDR
        # os.environ["WORLD_SIZE"] = str(DIST_DEFAULT_WORLD_SIZE)
        # os.environ["RANK"] = str(rank)
        # dist.init_process_group(backend=DIST_DEFAULT_BACKEND)

        print(
            f"[{os.getpid()}] Computing {rank}/{DIST_DEFAULT_WORLD_SIZE} at {DIST_DEFAULT_INIT_METHOD}"
        )
        # ensuring that each process exclusively works on a single GPU
        torch.cuda.set_device(rank)
        model.cuda(rank)
        # When using a single GPU per process and per
        # DistributedDataParallel, we need to divide the batch size
        # ourselves based on the total number of GPUs we have
        model = nn.parallel.DistributedDataParallel(model, device_ids=[rank])
    else:
        model = nn.DataParallel(model).cuda()

    loss_func = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=0.001)

    # dataset
    dataset = TinyDataset()
    dist_sampler = data.distributed.DistributedSampler(
        dataset) if args.distributed else None
    dataloader = data.dataloader.DataLoader(
        dataset,
        batch_size=DEFAULT_BATCH_SIZE // DIST_DEFAULT_WORLD_SIZE if args.distributed else DEFAULT_BATCH_SIZE,
        shuffle=(dist_sampler is None),
        num_workers=DEFAULT_NUM_WORKERS_PER_GPU if args.distributed else DEFAULT_NUM_WORKERS_PER_GPU * DIST_DEFAULT_WORLD_SIZE,
        sampler=dist_sampler)

    # train
    model = model.train()
    for epoch in range(2):
        if args.distributed:
            dist_sampler.set_epoch(epoch)
        for i, (x, label) in enumerate(dataloader):
            y = model(x)
            loss = loss_func(y, label.to(y.device))

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        if args.distributed:
            print(
                f"[{os.getpid()}] Epoch-{epoch} ended {rank}/{DIST_DEFAULT_WORLD_SIZE} at {DIST_DEFAULT_INIT_METHOD} on {y.device}"
            )
        else:
            print(f"[{os.getpid()}] Epoch-{epoch} ended on {y.device}")

    if args.distributed:
        print(
            f"[{os.getpid()}] Finishing {rank}/{DIST_DEFAULT_WORLD_SIZE} at {DIST_DEFAULT_INIT_METHOD} on {y.device}"
        )
        dist.destroy_process_group()


def launch(args):
    tic = time.time()
    if args.distributed:
        mp.spawn(worker,
                 args=(args, ),
                 nprocs=DIST_DEFAULT_WORLD_SIZE,
                 join=True)
    else:
        worker(None, args)
    toc = time.time()
    print(f"Finished in {toc-tic:.2f}s")


if __name__ == "__main__":
    parser = ArgumentParser()
    parser.add_argument("-d", "--distributed", action="store_true")
    args = parser.parse_args()

    launch(args)

spawn创建多进程

def launch(args):
    tic = time.time()
    if args.distributed:
        mp.spawn(worker,
                 args=(args, ),
                 nprocs=DIST_DEFAULT_WORLD_SIZE,
                 join=True)
    else:
        worker(None, args)
    toc = time.time()
    print(f"Finished in {toc-tic:.2f}s")

launch使用multiprocessing.spawn来快速创建nprocs个新进程，每个进程都执行worker函数，并传入args作为函数参数。

需要注意的是，spawn默认会为函数传入一个i，且i在[0, nprocs)之间。即，worker函数收到的参数列表是(i, args, )。

worker多进程并行

每一个worker进程做的工作分以下几个阶段：

初始化进程组：并行启动的多进程相互之间得形成一个进程组，即，要知道在哪会合（rendezvous）。DDP的机制会把以rank=0进程上的模型为准，自动保证其他进程上的模型与之保持一致性。
DDP包装模型：创建模型，用DistributedDataParallel包装，移动到该进程对应的GPU设备上。
准备数据：为DDP建立DistributedSampler，以便DataLoader将数据加载给每个GPU上训练的模型。
进行训练：每个进程根据DataLoader分担的batch_size来并行处理训练数据。
销毁进程组：进程执行完成后，销毁启动的进程组。

init_process_group

参阅torch.distributed的官方文档：
Distributedcommunication package - torch.distributed

初始化函数原型：

1	torch.distributed.init_process_group(backend, init_method=None, timeout=datetime.timedelta(0, 1800), world_size=-1, rank=-1, store=None, group_name='')

其中，进程组后端是负责提供进程组聚合通信（collectivecommunication）的库。PyTorch支持Gloo, MPI和NCCL三种，推荐的做法是，

在分布式GPU训练时使用NCCL；
在分布式CPU训练时使用Gloo。

参考资料：

Gloo
Gloo is a collective communications library. It comes with a numberof collective algorithms useful for machine learning applications. Theseinclude a barrier, broadcast, and allreduce.

NVIDIA NCCL
The NVIDIA Collective Communication Library (NCCL) implementsmulti-GPU and multi-node communication primitives optimized for NVIDIAGPUs and Networking. NCCL provides routines such as all-gather,all-reduce, broadcast, reduce, reduce-scatter as well as point-to-pointsend and receive that are optimized to achieve high bandwidth and lowlatency over PCIe and NVLink high-speed interconnects within a node andover NVIDIA Mellanox Network across nodes.

初始化可以选择通过init_method填写通信地址和端口，也可以通过store来传入一个进程间共同访问的键值对容器。

# initialize with TCP in this example
dist.init_process_group(backend=DIST_DEFAULT_BACKEND,
                        init_method=DIST_DEFAULT_INIT_METHOD,
                        world_size=DIST_DEFAULT_WORLD_SIZE,
                        rank=rank)

# # Another way to initialize with environment variables
# os.environ["MASTER_PORT"] = DIST_DEFAULT_PORT
# os.environ["MASTER_ADDR"] = DIST_DEFAULT_ADDR
# os.environ["WORLD_SIZE"] = str(DIST_DEFAULT_WORLD_SIZE)
# os.environ["RANK"] = str(rank)
# dist.init_process_group(backend=DIST_DEFAULT_BACKEND)

例子中演示了基于init_method的通信方式，具体采用TCP连接的方式来初始化，也可通过环境变量的方式（见注释掉的代码）。另外，还可以使用共享的文件系统来实现初始化，可参阅torch.distributed的官方文档。我觉得TCP连接足够简单且兼容性好，这里就以TCP的方式为主了。

DistributedDataParallel

参阅torch.nn.parallel.DistributedDataParallel的官方文档：
torch.nn.parallel.DistributedDataParallel

首先，需要注意的是，在建立DDP之前，在N个GPU的机器上，spawn出N个进程的时候，需要确保每个进程负责其对应的那一个GPU，不要互相打架。

# ensuring that each process exclusively works on a single GPU
torch.cuda.set_device(rank)
model.cuda(rank)
# When using a single GPU per process and per
# DistributedDataParallel, we need to divide the batch size
# ourselves based on the total number of GPUs we have
model = nn.parallel.DistributedDataParallel(model, device_ids=[rank])

用torch.nn.parallel.DistributedDataParallel类包装原模型，并将该进程的模型映射到对应的GPU设备上。

DistributedSampler

参阅torch.utils.data.distributed.DistributedSampler的官方文档：
torch.utils.data.distributed.DistributedSampler

其实就是在多进程的情况下，每个进程训练数据集的一个子集，不应互相重复，通过DistributedSampler来实现分布式的采样原数据集中的一个子集：

# dataset
dataset = TinyDataset()
dist_sampler = data.distributed.DistributedSampler(
    dataset) if args.distributed else None
dataloader = data.dataloader.DataLoader(
    dataset,
    batch_size=DEFAULT_BATCH_SIZE // DIST_DEFAULT_WORLD_SIZE if args.distributed else DEFAULT_BATCH_SIZE,
    shuffle=(dist_sampler is None),
    num_workers=DEFAULT_NUM_WORKERS_PER_GPU if args.distributed else DEFAULT_NUM_WORKERS_PER_GPU * DIST_DEFAULT_WORLD_SIZE,
    sampler=dist_sampler)

需要注意的是，多epoch场景下，需要在每个epoch开始前用sampler.set_epoch(epoch)设置当前的epoch，以免每次epoch训练的数据顺序都是相同的。

fasterrcnn_resnet50_fpn - 从torchvision源码理解Faster R-CNN原理

2020-11-25T11:41:12.000Z

PyTorch的torchvision包中实现了FasterR-CNN。本文结合对torchvision源码的阅读，深入理解FasterR-CNN的内部原理，以便进行开发利用。

fasterrcnn_resnet50_fpn- 从torchvision源码理解Faster R-CNN原理

1 接口层

外部调用

根据PyTorch的torchvision库的文档，FasterR-CNN模型对象可以直接通过fasterrcnn_resnet50_fpn函数来构造。

具体地，官方文档给出了训练时和预测时的调用样例：

torchvision.models.detection.fasterrcnn_resnet50_fpn

>>> model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
>>> # For training
>>> images, boxes = torch.rand(4, 3, 600, 1200), torch.rand(4, 11, 4)
>>> labels = torch.randint(1, 91, (4, 11))
>>> images = list(image for image in images)
>>> targets = []
>>> for i in range(len(images)):
>>>     d = {}
>>>     d['boxes'] = boxes[i]
>>>     d['labels'] = labels[i]
>>>     targets.append(d)
>>> output = model(images, targets)
>>> # For inference
>>> model.eval()
>>> x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
>>> predictions = model(x)
>>>
>>> # optionally, if you want to export the model to ONNX:
>>> torch.onnx.export(model, x, "faster_rcnn.onnx", opset_version = 11)

其中，不论是训练，还是预测，模型的输入都是list容器，表示的是若干个图片（与目标框和类别）。

fasterrcnn_resnet50_fpn

fasterrcnn_resnet50_fpn函数在torchvision.models.detection.faster_rcnn包中实现，文档见torchvision.models.detection.fasterrcnn_resnet50_fpn。

def fasterrcnn_resnet50_fpn(pretrained=False, progress=True,
                            num_classes=91, pretrained_backbone=True, trainable_backbone_layers=3, **kwargs):
    """
    Constructs a Faster R-CNN model with a ResNet-50-FPN backbone.

    The input to the model is expected to be a list of tensors, each of shape ``[C, H, W]``, one for each
    image, and should be in ``0-1`` range. Different images can have different sizes.

    The behavior of the model changes depending if it is in training or evaluation mode.

    During training, the model expects both the input tensors, as well as a targets (list of dictionary),
    containing:
        - boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x1, y1, x2, y2]`` format, with values of ``x``
          between ``0`` and ``W`` and values of ``y`` between ``0`` and ``H``
        - labels (``Int64Tensor[N]``): the class label for each ground-truth box

    The model returns a ``Dict[Tensor]`` during training, containing the classification and regression
    losses for both the RPN and the R-CNN.

    During inference, the model requires only the input tensors, and returns the post-processed
    predictions as a ``List[Dict[Tensor]]``, one for each input image. The fields of the ``Dict`` are as
    follows:
        - boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x1, y1, x2, y2]`` format, with values of ``x``
          between ``0`` and ``W`` and values of ``y`` between ``0`` and ``H``
        - labels (``Int64Tensor[N]``): the predicted labels for each image
        - scores (``Tensor[N]``): the scores or each prediction

    Faster R-CNN is exportable to ONNX for a fixed batch size with inputs images of fixed size.

    Example::

        >>> model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
        >>> # For training
        >>> images, boxes = torch.rand(4, 3, 600, 1200), torch.rand(4, 11, 4)
        >>> labels = torch.randint(1, 91, (4, 11))
        >>> images = list(image for image in images)
        >>> targets = []
        >>> for i in range(len(images)):
        >>>     d = {}
        >>>     d['boxes'] = boxes[i]
        >>>     d['labels'] = labels[i]
        >>>     targets.append(d)
        >>> output = model(images, targets)
        >>> # For inference
        >>> model.eval()
        >>> x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
        >>> predictions = model(x)
        >>>
        >>> # optionally, if you want to export the model to ONNX:
        >>> torch.onnx.export(model, x, "faster_rcnn.onnx", opset_version = 11)

    Arguments:
        pretrained (bool): If True, returns a model pre-trained on COCO train2017
        progress (bool): If True, displays a progress bar of the download to stderr
        pretrained_backbone (bool): If True, returns a model with backbone pre-trained on Imagenet
        num_classes (int): number of output classes of the model (including the background)
        trainable_backbone_layers (int): number of trainable (not frozen) resnet layers starting from final block.
            Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable.
    """
    assert trainable_backbone_layers <= 5 and trainable_backbone_layers >= 0
    # dont freeze any layers if pretrained model or backbone is not used
    if not (pretrained or pretrained_backbone):
        trainable_backbone_layers = 5
    if pretrained:
        # no need to download the backbone if pretrained is set
        pretrained_backbone = False
    backbone = resnet_fpn_backbone('resnet50', pretrained_backbone, trainable_layers=trainable_backbone_layers)
    model = FasterRCNN(backbone, num_classes, **kwargs)
    if pretrained:
        state_dict = load_state_dict_from_url(model_urls['fasterrcnn_resnet50_fpn_coco'],
                                              progress=progress)
        model.load_state_dict(state_dict)
    return model

该函数的实现中，首先进行参数检查：

检查trainable_backbone_layers参数，必须在0~5之间，表示从最后一层开始计数，有几层在训练中是可优化的；
检查pretrained和pretrained_backbone参数，如果整个模型都设为预训练的，那就当然没必要再单独下载预训练的backbone了，把整个FasterR-CNN模型都载入预训练参数即可。

FasterR-CNN模型是FasterRCNN类的实例。实例化时，传入指定的backbone作为FasterRCNN的backbone。

resnet_fpn_backbone

backbone通过对外开放的resnet_fpn_backbone函数来构造。

resnet_fpn_backbone函数在torchvision.models.detection.backbone_utils包中实现。

def resnet_fpn_backbone(
    backbone_name,
    pretrained,
    norm_layer=misc_nn_ops.FrozenBatchNorm2d,
    trainable_layers=3,
    returned_layers=None,
    extra_blocks=None
):
    """
    Constructs a specified ResNet backbone with FPN on top. Freezes the specified number of layers in the backbone.

    Examples::

        >>> from torchvision.models.detection.backbone_utils import resnet_fpn_backbone
        >>> backbone = resnet_fpn_backbone('resnet50', pretrained=True, trainable_layers=3)
        >>> # get some dummy image
        >>> x = torch.rand(1,3,64,64)
        >>> # compute the output
        >>> output = backbone(x)
        >>> print([(k, v.shape) for k, v in output.items()])
        >>> # returns
        >>>   [('0', torch.Size([1, 256, 16, 16])),
        >>>    ('1', torch.Size([1, 256, 8, 8])),
        >>>    ('2', torch.Size([1, 256, 4, 4])),
        >>>    ('3', torch.Size([1, 256, 2, 2])),
        >>>    ('pool', torch.Size([1, 256, 1, 1]))]

    Arguments:
        backbone_name (string): resnet architecture. Possible values are 'ResNet', 'resnet18', 'resnet34', 'resnet50',
             'resnet101', 'resnet152', 'resnext50_32x4d', 'resnext101_32x8d', 'wide_resnet50_2', 'wide_resnet101_2'
        norm_layer (torchvision.ops): it is recommended to use the default value. For details visit:
            (https://github.com/facebookresearch/maskrcnn-benchmark/issues/267)
        pretrained (bool): If True, returns a model with backbone pre-trained on Imagenet
        trainable_layers (int): number of trainable (not frozen) resnet layers starting from final block.
            Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable.
    """
    backbone = resnet.__dict__[backbone_name](
        pretrained=pretrained,
        norm_layer=norm_layer)

    # select layers that wont be frozen
    assert trainable_layers <= 5 and trainable_layers >= 0
    layers_to_train = ['layer4', 'layer3', 'layer2', 'layer1', 'conv1'][:trainable_layers]
    # freeze layers only if pretrained backbone is used
    for name, parameter in backbone.named_parameters():
        if all([not name.startswith(layer) for layer in layers_to_train]):
            parameter.requires_grad_(False)

    if extra_blocks is None:
        extra_blocks = LastLevelMaxPool()

    if returned_layers is None:
        returned_layers = [1, 2, 3, 4]
    assert min(returned_layers) > 0 and max(returned_layers) < 5
    return_layers = {f'layer{k}': str(v) for v, k in enumerate(returned_layers)}

    in_channels_stage2 = backbone.inplanes // 8
    in_channels_list = [in_channels_stage2 * 2 ** (i - 1) for i in returned_layers]
    out_channels = 256
    return BackboneWithFPN(backbone, return_layers, in_channels_list, out_channels, extra_blocks=extra_blocks)

首先，根据传入参数选出对应的resnet模型进行实例化。

随后，检查trainable_layers参数的合法值范围，并通过parameter.requires_grad_(False)来freeze除此以外的其他层。

默认未定义extra_blocks的时候，会在featuremap结尾添加一个maxpool2d层，该LastLevelMaxPool类实现并不复杂：

# defined in torchvision.ops.feature_pyramid_network
class LastLevelMaxPool(ExtraFPNBlock):
    """
    Applies a max_pool2d on top of the last feature map
    """
    def forward(
        self,
        x: List[Tensor],
        y: List[Tensor],
        names: List[str],
    ) -> Tuple[List[Tensor], List[str]]:
        names.append("pool")
        x.append(F.max_pool2d(x[-1], 1, 2, 0))
        return x, names

根据官方文档torch.nn.functional.max_pool2d可进一步查阅torch.nn.MaxPool2d，实际上F.max_pool2d(x[-1], 1, 2, 0)表示：

输入input为x[-1]；
池化窗口大小kernel_size为1；
步长stride为2；
边界填充padding为0。

关于卷积类的操作可以结合可视化理解：

Convolutionarithmetic

然后，处理其他传参：

return_layers，这是一个dict，与传入的backbone相配合，key是backbone的modulename，value是用户定义的返回名；
in_channels_list，这是一个list，与传入的backbone和return_layers相配合，是backbone返回的每一层featuremap的通道数；
out_channels，一个整数，FPN中的通道数。

2 实现层

我们以一个例子贯穿始终：

import torch
import torchvision

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]# 模拟输入两张尺寸不同的图片
predictions = model(x)

我们使用预训练模型，并模拟输入两张图片。均为3通道，一张$300 \times 400$的$H \times W$分辨率，一张$500 \times 400$。

FasterRCNN

FasterRCNN类在torchvision.models.detection.faster_rcnn.py中实现。

class FasterRCNN(GeneralizedRCNN):
    """
    Implements Faster R-CNN.

    The input to the model is expected to be a list of tensors, each of shape [C, H, W], one for each
    image, and should be in 0-1 range. Different images can have different sizes.

    The behavior of the model changes depending if it is in training or evaluation mode.

    During training, the model expects both the input tensors, as well as a targets (list of dictionary),
    containing:
        - boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format, with values of x
          between 0 and W and values of y between 0 and H
        - labels (Int64Tensor[N]): the class label for each ground-truth box

    The model returns a Dict[Tensor] during training, containing the classification and regression
    losses for both the RPN and the R-CNN.

    During inference, the model requires only the input tensors, and returns the post-processed
    predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as
    follows:
        - boxes (FloatTensor[N, 4]): the predicted boxes in [x1, y1, x2, y2] format, with values of x
          between 0 and W and values of y between 0 and H
        - labels (Int64Tensor[N]): the predicted labels for each image
        - scores (Tensor[N]): the scores or each prediction

    Arguments:
        backbone (nn.Module): the network used to compute the features for the model.
            It should contain a out_channels attribute, which indicates the number of output
            channels that each feature map has (and it should be the same for all feature maps).
            The backbone should return a single Tensor or and OrderedDict[Tensor].
        num_classes (int): number of output classes of the model (including the background).
            If box_predictor is specified, num_classes should be None.
        min_size (int): minimum size of the image to be rescaled before feeding it to the backbone
        max_size (int): maximum size of the image to be rescaled before feeding it to the backbone
        image_mean (Tuple[float, float, float]): mean values used for input normalization.
            They are generally the mean values of the dataset on which the backbone has been trained
            on
        image_std (Tuple[float, float, float]): std values used for input normalization.
            They are generally the std values of the dataset on which the backbone has been trained on
        rpn_anchor_generator (AnchorGenerator): module that generates the anchors for a set of feature
            maps.
        rpn_head (nn.Module): module that computes the objectness and regression deltas from the RPN
        rpn_pre_nms_top_n_train (int): number of proposals to keep before applying NMS during training
        rpn_pre_nms_top_n_test (int): number of proposals to keep before applying NMS during testing
        rpn_post_nms_top_n_train (int): number of proposals to keep after applying NMS during training
        rpn_post_nms_top_n_test (int): number of proposals to keep after applying NMS during testing
        rpn_nms_thresh (float): NMS threshold used for postprocessing the RPN proposals
        rpn_fg_iou_thresh (float): minimum IoU between the anchor and the GT box so that they can be
            considered as positive during training of the RPN.
        rpn_bg_iou_thresh (float): maximum IoU between the anchor and the GT box so that they can be
            considered as negative during training of the RPN.
        rpn_batch_size_per_image (int): number of anchors that are sampled during training of the RPN
            for computing the loss
        rpn_positive_fraction (float): proportion of positive anchors in a mini-batch during training
            of the RPN
        box_roi_pool (MultiScaleRoIAlign): the module which crops and resizes the feature maps in
            the locations indicated by the bounding boxes
        box_head (nn.Module): module that takes the cropped feature maps as input
        box_predictor (nn.Module): module that takes the output of box_head and returns the
            classification logits and box regression deltas.
        box_score_thresh (float): during inference, only return proposals with a classification score
            greater than box_score_thresh
        box_nms_thresh (float): NMS threshold for the prediction head. Used during inference
        box_detections_per_img (int): maximum number of detections per image, for all classes.
        box_fg_iou_thresh (float): minimum IoU between the proposals and the GT box so that they can be
            considered as positive during training of the classification head
        box_bg_iou_thresh (float): maximum IoU between the proposals and the GT box so that they can be
            considered as negative during training of the classification head
        box_batch_size_per_image (int): number of proposals that are sampled during training of the
            classification head
        box_positive_fraction (float): proportion of positive proposals in a mini-batch during training
            of the classification head
        bbox_reg_weights (Tuple[float, float, float, float]): weights for the encoding/decoding of the
            bounding boxes

    Example::

        >>> import torch
        >>> import torchvision
        >>> from torchvision.models.detection import FasterRCNN
        >>> from torchvision.models.detection.rpn import AnchorGenerator
        >>> # load a pre-trained model for classification and return
        >>> # only the features
        >>> backbone = torchvision.models.mobilenet_v2(pretrained=True).features
        >>> # FasterRCNN needs to know the number of
        >>> # output channels in a backbone. For mobilenet_v2, it's 1280
        >>> # so we need to add it here
        >>> backbone.out_channels = 1280
        >>>
        >>> # let's make the RPN generate 5 x 3 anchors per spatial
        >>> # location, with 5 different sizes and 3 different aspect
        >>> # ratios. We have a Tuple[Tuple[int]] because each feature
        >>> # map could potentially have different sizes and
        >>> # aspect ratios
        >>> anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
        >>>                                    aspect_ratios=((0.5, 1.0, 2.0),))
        >>>
        >>> # let's define what are the feature maps that we will
        >>> # use to perform the region of interest cropping, as well as
        >>> # the size of the crop after rescaling.
        >>> # if your backbone returns a Tensor, featmap_names is expected to
        >>> # be ['0']. More generally, the backbone should return an
        >>> # OrderedDict[Tensor], and in featmap_names you can choose which
        >>> # feature maps to use.
        >>> roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0'],
        >>>                                                 output_size=7,
        >>>                                                 sampling_ratio=2)
        >>>
        >>> # put the pieces together inside a FasterRCNN model
        >>> model = FasterRCNN(backbone,
        >>>                    num_classes=2,
        >>>                    rpn_anchor_generator=anchor_generator,
        >>>                    box_roi_pool=roi_pooler)
        >>> model.eval()
        >>> x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
        >>> predictions = model(x)
    """

    def __init__(self, backbone, num_classes=None,
                 # transform parameters
                 min_size=800, max_size=1333,
                 image_mean=None, image_std=None,
                 # RPN parameters
                 rpn_anchor_generator=None, rpn_head=None,
                 rpn_pre_nms_top_n_train=2000, rpn_pre_nms_top_n_test=1000,
                 rpn_post_nms_top_n_train=2000, rpn_post_nms_top_n_test=1000,
                 rpn_nms_thresh=0.7,
                 rpn_fg_iou_thresh=0.7, rpn_bg_iou_thresh=0.3,
                 rpn_batch_size_per_image=256, rpn_positive_fraction=0.5,
                 # Box parameters
                 box_roi_pool=None, box_head=None, box_predictor=None,
                 box_score_thresh=0.05, box_nms_thresh=0.5, box_detections_per_img=100,
                 box_fg_iou_thresh=0.5, box_bg_iou_thresh=0.5,
                 box_batch_size_per_image=512, box_positive_fraction=0.25,
                 bbox_reg_weights=None):

        if not hasattr(backbone, "out_channels"):
            raise ValueError(
                "backbone should contain an attribute out_channels "
                "specifying the number of output channels (assumed to be the "
                "same for all the levels)")

        assert isinstance(rpn_anchor_generator, (AnchorGenerator, type(None)))
        assert isinstance(box_roi_pool, (MultiScaleRoIAlign, type(None)))

        if num_classes is not None:
            if box_predictor is not None:
                raise ValueError("num_classes should be None when box_predictor is specified")
        else:
            if box_predictor is None:
                raise ValueError("num_classes should not be None when box_predictor "
                                 "is not specified")

        out_channels = backbone.out_channels

        if rpn_anchor_generator is None:
            anchor_sizes = ((32,), (64,), (128,), (256,), (512,))
            aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes)
            rpn_anchor_generator = AnchorGenerator(
                anchor_sizes, aspect_ratios
            )
        if rpn_head is None:
            rpn_head = RPNHead(
                out_channels, rpn_anchor_generator.num_anchors_per_location()[0]
            )

        rpn_pre_nms_top_n = dict(training=rpn_pre_nms_top_n_train, testing=rpn_pre_nms_top_n_test)
        rpn_post_nms_top_n = dict(training=rpn_post_nms_top_n_train, testing=rpn_post_nms_top_n_test)

        rpn = RegionProposalNetwork(
            rpn_anchor_generator, rpn_head,
            rpn_fg_iou_thresh, rpn_bg_iou_thresh,
            rpn_batch_size_per_image, rpn_positive_fraction,
            rpn_pre_nms_top_n, rpn_post_nms_top_n, rpn_nms_thresh)

        if box_roi_pool is None:
            box_roi_pool = MultiScaleRoIAlign(
                featmap_names=['0', '1', '2', '3'],
                output_size=7,
                sampling_ratio=2)

        if box_head is None:
            resolution = box_roi_pool.output_size[0]
            representation_size = 1024
            box_head = TwoMLPHead(
                out_channels * resolution ** 2,
                representation_size)

        if box_predictor is None:
            representation_size = 1024
            box_predictor = FastRCNNPredictor(
                representation_size,
                num_classes)

        roi_heads = RoIHeads(
            # Box
            box_roi_pool, box_head, box_predictor,
            box_fg_iou_thresh, box_bg_iou_thresh,
            box_batch_size_per_image, box_positive_fraction,
            bbox_reg_weights,
            box_score_thresh, box_nms_thresh, box_detections_per_img)

        if image_mean is None:
            image_mean = [0.485, 0.456, 0.406]
        if image_std is None:
            image_std = [0.229, 0.224, 0.225]
        transform = GeneralizedRCNNTransform(min_size, max_size, image_mean, image_std)

        super(FasterRCNN, self).__init__(backbone, rpn, roi_heads, transform)

FasterRCNN的代码看起来很长，实际上主要是文档注释。

FasterRCNN的实现只有__init__函数，因为FasterRCNN继承自GeneralizedRCNN，主要结构和计算流的实现都在父类中实现了，该子类的实现实际上只需要做一些参数检查和子类的具体子结构的实例化。

FasterRCNN的__init__函数的主要就是在做参数检查和一些实例化准备工作，其结果就是将准备好的backbone、rpn、roi_heads和transform对象传递给父类（GeneralizedRCNN）的初始化函数，由此构建一个FasterRCNN实例对象。

GeneralizedRCNN

GeneralizedRCNN在torchvision.models.detection.generalized_rcnn.py中实现，负责以父类的形式定义RCNN架构的整体计算。

class GeneralizedRCNN(nn.Module):
    """
    Main class for Generalized R-CNN.

    Arguments:
        backbone (nn.Module):
        rpn (nn.Module):
        roi_heads (nn.Module): takes the features + the proposals from the RPN and computes
            detections / masks from it.
        transform (nn.Module): performs the data transformation from the inputs to feed into
            the model
    """

    def __init__(self, backbone, rpn, roi_heads, transform):
        super(GeneralizedRCNN, self).__init__()
        self.transform = transform
        self.backbone = backbone
        self.rpn = rpn
        self.roi_heads = roi_heads
        # used only on torchscript mode
        self._has_warned = False

    @torch.jit.unused
    def eager_outputs(self, losses, detections):
        # type: (Dict[str, Tensor], List[Dict[str, Tensor]]) -> Union[Dict[str, Tensor], List[Dict[str, Tensor]]]
        if self.training:
            return losses

        return detections

    def forward(self, images, targets=None):
        # type: (List[Tensor], Optional[List[Dict[str, Tensor]]]) -> Tuple[Dict[str, Tensor], List[Dict[str, Tensor]]]
        """
        Arguments:
            images (list[Tensor]): images to be processed
            targets (list[Dict[Tensor]]): ground-truth boxes present in the image (optional)

        Returns:
            result (list[BoxList] or dict[Tensor]): the output from the model.
                During training, it returns a dict[Tensor] which contains the losses.
                During testing, it returns list[BoxList] contains additional fields
                like `scores`, `labels` and `mask` (for Mask R-CNN models).

        """
        if self.training and targets is None:
            raise ValueError("In training mode, targets should be passed")
        if self.training:
            assert targets is not None
            for target in targets:
                boxes = target["boxes"]
                if isinstance(boxes, torch.Tensor):
                    if len(boxes.shape) != 2 or boxes.shape[-1] != 4:
                        raise ValueError("Expected target boxes to be a tensor"
                                         "of shape [N, 4], got {:}.".format(
                                             boxes.shape))
                else:
                    raise ValueError("Expected target boxes to be of type "
                                     "Tensor, got {:}.".format(type(boxes)))

        original_image_sizes = torch.jit.annotate(List[Tuple[int, int]], [])
        for img in images:
            val = img.shape[-2:]
            assert len(val) == 2
            original_image_sizes.append((val[0], val[1]))

        images, targets = self.transform(images, targets)

        # Check for degenerate boxes
        # TODO: Move this to a function
        if targets is not None:
            for target_idx, target in enumerate(targets):
                boxes = target["boxes"]
                degenerate_boxes = boxes[:, 2:] <= boxes[:, :2]
                if degenerate_boxes.any():
                    # print the first degenerate box
                    bb_idx = torch.where(degenerate_boxes.any(dim=1))[0][0]
                    degen_bb: List[float] = boxes[bb_idx].tolist()
                    raise ValueError("All bounding boxes should have positive height and width."
                                     " Found invalid box {} for target at index {}."
                                     .format(degen_bb, target_idx))

        features = self.backbone(images.tensors)
        if isinstance(features, torch.Tensor):
            features = OrderedDict([('0', features)])
        proposals, proposal_losses = self.rpn(images, features, targets)
        detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
        detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)

        losses = {}
        losses.update(detector_losses)
        losses.update(proposal_losses)

        if torch.jit.is_scripting():
            if not self._has_warned:
                warnings.warn("RCNN always returns a (Losses, Detections) tuple in scripting")
                self._has_warned = True
            return (losses, detections)
        else:
            return self.eager_outputs(losses, detections)

在__init__中，GenerailizedRCNN把R-CNN架构定义为4个组成部分：

transform：一个变换模型，用于对图像和其他输入进行变换；
backbone：一个特征提取模型，输入的是进过变换处理的图像张量，输出的是取得的图像特征features；
rpn：一个RPN模型，输入包含——图像images、backbone提取出的图像特征features以及训练时输入的包含bboxground truth的targets，输出包含——预测的区域proposals和相应的损失；
roi_heads：一个RoIHeads模型，输入包含——backbone输出的features，RPN输出的proposals，以及图像尺寸和训练时的targets。

该类的__forward__计算流差不多就是这四部分依次执行的过程，除了一些参数检查，训练时和预测时对输入的区分以外，主要代码逻辑可以概括为：

def forward(self, images, targets=None):
    # 1. transform
    images, targets = self.transform(images, targets)
    # 2. backbone
    features = self.backbone(images.tensors)
    # 3. rpn
    proposals, proposal_losses = self.rpn(images, features, targets)
    # 4. roi_heads
    detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
    # 5. postprocess
    detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)

GeneralizedRCNNTransform

FasterR-CNN模型对输入图像的预处理由torchvision.models.detection.transform包的GeneralizedRCNNTransform类实现。

class GeneralizedRCNNTransform(nn.Module):
    """
    Performs input / target transformation before feeding the data to a GeneralizedRCNN
    model.

    The transformations it perform are:
        - input normalization (mean subtraction and std division)
        - input / target resizing to match min_size / max_size

    It returns a ImageList for the inputs, and a List[Dict[Tensor]] for the targets
    """

    def __init__(self, min_size, max_size, image_mean, image_std):
        super(GeneralizedRCNNTransform, self).__init__()
        if not isinstance(min_size, (list, tuple)):
            min_size = (min_size,)
        self.min_size = min_size
        self.max_size = max_size
        self.image_mean = image_mean
        self.image_std = image_std

    def forward(self,
                images,       # type: List[Tensor]
                targets=None  # type: Optional[List[Dict[str, Tensor]]]
                ):
        # type: (...) -> Tuple[ImageList, Optional[List[Dict[str, Tensor]]]]
        images = [img for img in images]
        if targets is not None:
            # make a copy of targets to avoid modifying it in-place
            # once torchscript supports dict comprehension
            # this can be simplified as as follows
            # targets = [{k: v for k,v in t.items()} for t in targets]
            targets_copy: List[Dict[str, Tensor]] = []
            for t in targets:
                data: Dict[str, Tensor] = {}
                for k, v in t.items():
                    data[k] = v
                targets_copy.append(data)
            targets = targets_copy
        for i in range(len(images)):
            image = images[i]
            target_index = targets[i] if targets is not None else None

            if image.dim() != 3:
                raise ValueError("images is expected to be a list of 3d tensors "
                                 "of shape [C, H, W], got {}".format(image.shape))
            image = self.normalize(image)
            image, target_index = self.resize(image, target_index)
            images[i] = image
            if targets is not None and target_index is not None:
                targets[i] = target_index

        image_sizes = [img.shape[-2:] for img in images]
        images = self.batch_images(images)
        image_sizes_list = torch.jit.annotate(List[Tuple[int, int]], [])
        for image_size in image_sizes:
            assert len(image_size) == 2
            image_sizes_list.append((image_size[0], image_size[1]))

        image_list = ImageList(images, image_sizes_list)
        return image_list, targets

    def normalize(self, image):
        dtype, device = image.dtype, image.device
        mean = torch.as_tensor(self.image_mean, dtype=dtype, device=device)
        std = torch.as_tensor(self.image_std, dtype=dtype, device=device)
        return (image - mean[:, None, None]) / std[:, None, None]

    def torch_choice(self, k):
        # type: (List[int]) -> int
        """
        Implements `random.choice` via torch ops so it can be compiled with
        TorchScript. Remove if https://github.com/pytorch/pytorch/issues/25803
        is fixed.
        """
        index = int(torch.empty(1).uniform_(0., float(len(k))).item())
        return k[index]

    def resize(self, image, target):
        # type: (Tensor, Optional[Dict[str, Tensor]]) -> Tuple[Tensor, Optional[Dict[str, Tensor]]]
        h, w = image.shape[-2:]
        if self.training:
            size = float(self.torch_choice(self.min_size))
        else:
            # FIXME assume for now that testing uses the largest scale
            size = float(self.min_size[-1])
        if torchvision._is_tracing():
            image, target = _resize_image_and_masks_onnx(image, size, float(self.max_size), target)
        else:
            image, target = _resize_image_and_masks(image, size, float(self.max_size), target)

        if target is None:
            return image, target

        bbox = target["boxes"]
        bbox = resize_boxes(bbox, (h, w), image.shape[-2:])
        target["boxes"] = bbox

        if "keypoints" in target:
            keypoints = target["keypoints"]
            keypoints = resize_keypoints(keypoints, (h, w), image.shape[-2:])
            target["keypoints"] = keypoints
        return image, target

    # _onnx_batch_images() is an implementation of
    # batch_images() that is supported by ONNX tracing.
    @torch.jit.unused
    def _onnx_batch_images(self, images, size_divisible=32):
        # type: (List[Tensor], int) -> Tensor
        max_size = []
        for i in range(images[0].dim()):
            max_size_i = torch.max(torch.stack([img.shape[i] for img in images]).to(torch.float32)).to(torch.int64)
            max_size.append(max_size_i)
        stride = size_divisible
        max_size[1] = (torch.ceil((max_size[1].to(torch.float32)) / stride) * stride).to(torch.int64)
        max_size[2] = (torch.ceil((max_size[2].to(torch.float32)) / stride) * stride).to(torch.int64)
        max_size = tuple(max_size)

        # work around for
        # pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)
        # which is not yet supported in onnx
        padded_imgs = []
        for img in images:
            padding = [(s1 - s2) for s1, s2 in zip(max_size, tuple(img.shape))]
            padded_img = torch.nn.functional.pad(img, (0, padding[2], 0, padding[1], 0, padding[0]))
            padded_imgs.append(padded_img)

        return torch.stack(padded_imgs)

    def max_by_axis(self, the_list):
        # type: (List[List[int]]) -> List[int]
        maxes = the_list[0]
        for sublist in the_list[1:]:
            for index, item in enumerate(sublist):
                maxes[index] = max(maxes[index], item)
        return maxes

    def batch_images(self, images, size_divisible=32):
        # type: (List[Tensor], int) -> Tensor
        if torchvision._is_tracing():
            # batch_images() does not export well to ONNX
            # call _onnx_batch_images() instead
            return self._onnx_batch_images(images, size_divisible)

        max_size = self.max_by_axis([list(img.shape) for img in images])
        stride = float(size_divisible)
        max_size = list(max_size)
        max_size[1] = int(math.ceil(float(max_size[1]) / stride) * stride)
        max_size[2] = int(math.ceil(float(max_size[2]) / stride) * stride)

        batch_shape = [len(images)] + max_size
        batched_imgs = images[0].new_full(batch_shape, 0)
        for img, pad_img in zip(images, batched_imgs):
            pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)

        return batched_imgs

    def postprocess(self,
                    result,               # type: List[Dict[str, Tensor]]
                    image_shapes,         # type: List[Tuple[int, int]]
                    original_image_sizes  # type: List[Tuple[int, int]]
                    ):
        # type: (...) -> List[Dict[str, Tensor]]
        if self.training:
            return result
        for i, (pred, im_s, o_im_s) in enumerate(zip(result, image_shapes, original_image_sizes)):
            boxes = pred["boxes"]
            boxes = resize_boxes(boxes, im_s, o_im_s)
            result[i]["boxes"] = boxes
            if "masks" in pred:
                masks = pred["masks"]
                masks = paste_masks_in_image(masks, boxes, o_im_s)
                result[i]["masks"] = masks
            if "keypoints" in pred:
                keypoints = pred["keypoints"]
                keypoints = resize_keypoints(keypoints, im_s, o_im_s)
                result[i]["keypoints"] = keypoints
        return result

    def __repr__(self):
        format_string = self.__class__.__name__ + '('
        _indent = '\n    '
        format_string += "{0}Normalize(mean={1}, std={2})".format(_indent, self.image_mean, self.image_std)
        format_string += "{0}Resize(min_size={1}, max_size={2}, mode='bilinear')".format(_indent, self.min_size,
                                                                                         self.max_size)
        format_string += '\n)'
        return format_string

对输入图像的初步转换处理在forward前向传播函数中实现，主要实现normalize和resize操作：

self.normalize：初始参数在FasterRCNN的初始化中被设为image_mean = [0.485, 0.456, 0.406]和image_std = [0.229, 0.224, 0.225]；
self.resize：初始参数在FasterRCNN的初始化中被设为min_size=800, max_size=1333；
self.batch_images，对一个batch的图像做了Padding，使其输出的张量尺寸一致。

根据该转换模块的默认值，结合本节开头的例子：

经过resize处理后，因为最小尺寸必须为800，因此$300 \times 400$的图片1转换为了$800 \times 1066$，$400 \times 500$的图片2转换为了$1000 \times 800$；
因为batch处理转tensors时加padding的缘故，两个图片的张量尺寸被统一为$1024 \times 1088$。

BackboneWithFPN

BackboneWithFPN在torchvision.models.detection.backbone_utils中实现，其作用就是以ResNet模型中提取出的一些中间层作为backbone，在backbone后面继续接上一个FPN。

class BackboneWithFPN(nn.Module):
    """
    Adds a FPN on top of a model.
    Internally, it uses torchvision.models._utils.IntermediateLayerGetter to
    extract a submodel that returns the feature maps specified in return_layers.
    The same limitations of IntermediatLayerGetter apply here.
    Arguments:
        backbone (nn.Module)
        return_layers (Dict[name, new_name]): a dict containing the names
            of the modules for which the activations will be returned as
            the key of the dict, and the value of the dict is the name
            of the returned activation (which the user can specify).
        in_channels_list (List[int]): number of channels for each feature map
            that is returned, in the order they are present in the OrderedDict
        out_channels (int): number of channels in the FPN.
    Attributes:
        out_channels (int): the number of channels in the FPN
    """
    def __init__(self, backbone, return_layers, in_channels_list, out_channels, extra_blocks=None):
        super(BackboneWithFPN, self).__init__()

        if extra_blocks is None:
            extra_blocks = LastLevelMaxPool()

        self.body = IntermediateLayerGetter(backbone, return_layers=return_layers)
        self.fpn = FeaturePyramidNetwork(
            in_channels_list=in_channels_list,
            out_channels=out_channels,
            extra_blocks=extra_blocks,
        )
        self.out_channels = out_channels

    def forward(self, x):
        x = self.body(x)
        x = self.fpn(x)
        return x

该类的实现很简单，就像是一个组合，把backbone和FPN装起来：

把从backbone中取出的（用于提供featuremaps）中间层作为模型的body；
构造出FPN（FeaturePyramidNetworkj）作为模型的fpn；

然后数据流定义很简洁，就是输入数据x先经过body，再经过fpn，就完成了。

有了backbone+FPN的模型，就可以进一步构造Faster R-CNN模型了。

torchvision的FasterR-CNN的backbone负责提取图像特征，具体实现由ResNet中间层衔接FPN组成。

ResNet

class ResNet(nn.Module)

ResNet在torchvision.models.resnet包中实现，属于卷积神经网络实现的范畴，本文不再赘述。

有了resnet作为backbone，就可以通过resnet_fpn_backbone构造一个在resnet后面接上FPN的模型，具体地，是构造BackboneWithFPN类的对象。

torchvision实现中：

默认后三层，即ResNet的layer4, layer3,layer2为可训练层，其余freeze；
默认返回后四层的feature map，即layer1, layer2, layer3,layer4，命名index依次为0, 1, 2, 3，每层输出feature map的通道数依次为256,512, 1024, 2048。

本节的例子经过ResNet部分的计算后，从输入的$2 \times 3 \times 1024 \times1088$的tensor，转换为了一个有序字典OrderedDict：

'0': shape[2, 256, 256, 272]，源自ResNet的layer1；
'1': shape[2, 512, 128, 136]，源自ResNet的layer2；
'2': shape[2, 1024, 64, 68]，源自ResNet的layer3；
'3': shape[2, 2048, 32, 34]，源自ResNet的layer4；

FPN(FeaturePyramidNetwork)

FeaturePyramidNetwork在torchvision.ops.feature_pyramid_network包中实现。FPN实现了金字塔结构的特征提取，低层的卷积感受野小，其特征代表小目标的特征，而高层的卷积感受野大，因此其特征适合表示大目标特征。在目标检测中运用FPN，在低层配合小尺寸anchor，在高层配合大尺寸anchors，有利于同时有效检测小目标和大目标。

class FeaturePyramidNetwork(nn.Module):
    """
    Module that adds a FPN from on top of a set of feature maps. This is based on
    `"Feature Pyramid Network for Object Detection" `_.

    The feature maps are currently supposed to be in increasing depth
    order.

    The input to the model is expected to be an OrderedDict[Tensor], containing
    the feature maps on top of which the FPN will be added.

    Arguments:
        in_channels_list (list[int]): number of channels for each feature map that
            is passed to the module
        out_channels (int): number of channels of the FPN representation
        extra_blocks (ExtraFPNBlock or None): if provided, extra operations will
            be performed. It is expected to take the fpn features, the original
            features and the names of the original features as input, and returns
            a new list of feature maps and their corresponding names

    Examples::

        >>> m = torchvision.ops.FeaturePyramidNetwork([10, 20, 30], 5)
        >>> # get some dummy data
        >>> x = OrderedDict()
        >>> x['feat0'] = torch.rand(1, 10, 64, 64)
        >>> x['feat2'] = torch.rand(1, 20, 16, 16)
        >>> x['feat3'] = torch.rand(1, 30, 8, 8)
        >>> # compute the FPN on top of x
        >>> output = m(x)
        >>> print([(k, v.shape) for k, v in output.items()])
        >>> # returns
        >>>   [('feat0', torch.Size([1, 5, 64, 64])),
        >>>    ('feat2', torch.Size([1, 5, 16, 16])),
        >>>    ('feat3', torch.Size([1, 5, 8, 8]))]

    """
    def __init__(
        self,
        in_channels_list: List[int],
        out_channels: int,
        extra_blocks: Optional[ExtraFPNBlock] = None,
    ):
        super(FeaturePyramidNetwork, self).__init__()
        self.inner_blocks = nn.ModuleList()
        self.layer_blocks = nn.ModuleList()
        for in_channels in in_channels_list:
            if in_channels == 0:
                raise ValueError("in_channels=0 is currently not supported")
            inner_block_module = nn.Conv2d(in_channels, out_channels, 1)
            layer_block_module = nn.Conv2d(out_channels, out_channels, 3, padding=1)
            self.inner_blocks.append(inner_block_module)
            self.layer_blocks.append(layer_block_module)

        # initialize parameters now to avoid modifying the initialization of top_blocks
        for m in self.children():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_uniform_(m.weight, a=1)
                nn.init.constant_(m.bias, 0)

        if extra_blocks is not None:
            assert isinstance(extra_blocks, ExtraFPNBlock)
        self.extra_blocks = extra_blocks

    def get_result_from_inner_blocks(self, x: Tensor, idx: int) -> Tensor:
        """
        This is equivalent to self.inner_blocks[idx](x),
        but torchscript doesn't support this yet
        """
        num_blocks = 0
        for m in self.inner_blocks:
            num_blocks += 1
        if idx < 0:
            idx += num_blocks
        i = 0
        out = x
        for module in self.inner_blocks:
            if i == idx:
                out = module(x)
            i += 1
        return out

    def get_result_from_layer_blocks(self, x: Tensor, idx: int) -> Tensor:
        """
        This is equivalent to self.layer_blocks[idx](x),
        but torchscript doesn't support this yet
        """
        num_blocks = 0
        for m in self.layer_blocks:
            num_blocks += 1
        if idx < 0:
            idx += num_blocks
        i = 0
        out = x
        for module in self.layer_blocks:
            if i == idx:
                out = module(x)
            i += 1
        return out

    def forward(self, x: Dict[str, Tensor]) -> Dict[str, Tensor]:
        """
        Computes the FPN for a set of feature maps.

        Arguments:
            x (OrderedDict[Tensor]): feature maps for each feature level.

        Returns:
            results (OrderedDict[Tensor]): feature maps after FPN layers.
                They are ordered from highest resolution first.
        """
        # unpack OrderedDict into two lists for easier handling
        names = list(x.keys())
        x = list(x.values())

        last_inner = self.get_result_from_inner_blocks(x[-1], -1)
        results = []
        results.append(self.get_result_from_layer_blocks(last_inner, -1))

        for idx in range(len(x) - 2, -1, -1):
            inner_lateral = self.get_result_from_inner_blocks(x[idx], idx)
            feat_shape = inner_lateral.shape[-2:]
            inner_top_down = F.interpolate(last_inner, size=feat_shape, mode="nearest")
            last_inner = inner_lateral + inner_top_down
            results.insert(0, self.get_result_from_layer_blocks(last_inner, idx))

        if self.extra_blocks is not None:
            results, names = self.extra_blocks(results, x, names)

        # make it back an OrderedDict
        out = OrderedDict([(k, v) for k, v in zip(names, results)])

        return out

按原论文的思路，FPN第n层输出feature map $P_n$的是把两者进行合并：

lateral：CNN的第n层feature map $C_n$，做1×1卷积；
top-down upsampling：FPN的n+1层feature map $P_{n+1}$做2×上采样（长宽各2倍）变成第n层的尺寸；

此后，采用3×3卷积对合并后的featuremap进行卷积处理，以便消除上采样操作造成的失真效应（aliasingeffect）。

此时，形成的每层的最终的feature map就是最终的feature map $P_n$，例如：从ResNet的2~5层feature map$\{C_2, C_3, C_4,C_5\}$经过FPN取得$\{P_2, P_3, P_4,P_5\}$，对应的两者的空域尺寸（spatial size）是相同的。

在torchvision的具体实现中：

self.inner_blocks就是FPN的所有1×1卷积；
self.layer_blocks就是FPN合并后需要用到的3×3卷积；

这两者都是nn.ModuleList()，在__init__初始化时，在一个n次（n个featuremap）的for循环中进行初始化，都填入nn.Conv2d对象，设置为统一的out_channels。

在__forward__定义的计算流中，核心代码逻辑可以概括为：

def forward(self, x: Dict[str, Tensor]):
    for idx in range(len(x) - 2, -1, -1):
        inner_lateral = self.get_result_from_inner_blocks(x[idx], idx)
        feat_shape = inner_lateral.shape[-2:]
        inner_top_down = F.interpolate(last_inner, size=feat_shape, mode="nearest")
        last_inner = inner_lateral + inner_top_down
        results.insert(0, self.get_result_from_layer_blocks(last_inner, idx))

    if self.extra_blocks is not None:
        results, names = self.extra_blocks(results, x, names)

    return out

具体步骤是从后往前计算每一层的result，即论文中的$P_n$：

inner_lateral就是CNN的featuremap经过1×1卷积计算的结果，该卷积通过self.get_result_from_inner_blocks(x[idx], idx)实现；
inner_top_down就是从后一层$P_{n+1}$上采样出来的结果，该上采样通过插值实现F.interpolate(last_inner, size=feat_shape, mode="nearest")；
last_inner就是两者合并的结果，通过element-wiseaddition实现；
在加入results前，还需要用3×3卷积计算一下，即self.get_result_from_layer_blocks(last_inner, idx)。

最后，如果还有额外计算块的话，就再算一遍，取得这层的结果也加入。

在具体实现中，在FPN尾部增加了LastLevelMaxPool，并将其计算结果命名为pool加入了names。

本节的例子经过FPN部分的计算后，从ResNet输出的4个通道数不同的featuremaps，转换为了各层通道数一致的一个有序字典OrderedDict：

'0': shape[2, 256, 256, 272]，源自ResNet的layer1；
'1': shape[2, 256, 128, 136]，源自ResNet的layer2；
'2': shape[2, 256, 64, 68]，源自ResNet的layer3；
'3': shape[2, 256, 32, 34]，源自ResNet的layer4；
'pool': shape[2, 256, 16, 17]，源自FPN作为extra_blocks的LastLevelMaxPool。

RegionProposalNetwork

RegionProposalNetwork在torchvision.models.detection.rpn包中实现。

RegionProposalNetwork的实现比较长，主要看__init__和__forward__就可以了。

class RegionProposalNetwork(torch.nn.Module):
    """
    Implements Region Proposal Network (RPN).

    Arguments:
        anchor_generator (AnchorGenerator): module that generates the anchors for a set of feature
            maps.
        head (nn.Module): module that computes the objectness and regression deltas
        fg_iou_thresh (float): minimum IoU between the anchor and the GT box so that they can be
            considered as positive during training of the RPN.
        bg_iou_thresh (float): maximum IoU between the anchor and the GT box so that they can be
            considered as negative during training of the RPN.
        batch_size_per_image (int): number of anchors that are sampled during training of the RPN
            for computing the loss
        positive_fraction (float): proportion of positive anchors in a mini-batch during training
            of the RPN
        pre_nms_top_n (Dict[int]): number of proposals to keep before applying NMS. It should
            contain two fields: training and testing, to allow for different values depending
            on training or evaluation
        post_nms_top_n (Dict[int]): number of proposals to keep after applying NMS. It should
            contain two fields: training and testing, to allow for different values depending
            on training or evaluation
        nms_thresh (float): NMS threshold used for postprocessing the RPN proposals

    """
    __annotations__ = {
        'box_coder': det_utils.BoxCoder,
        'proposal_matcher': det_utils.Matcher,
        'fg_bg_sampler': det_utils.BalancedPositiveNegativeSampler,
        'pre_nms_top_n': Dict[str, int],
        'post_nms_top_n': Dict[str, int],
    }

    def __init__(self,
                 anchor_generator,
                 head,
                 #
                 fg_iou_thresh, bg_iou_thresh,
                 batch_size_per_image, positive_fraction,
                 #
                 pre_nms_top_n, post_nms_top_n, nms_thresh):
        super(RegionProposalNetwork, self).__init__()
        self.anchor_generator = anchor_generator
        self.head = head
        self.box_coder = det_utils.BoxCoder(weights=(1.0, 1.0, 1.0, 1.0))

        # used during training
        self.box_similarity = box_ops.box_iou

        self.proposal_matcher = det_utils.Matcher(
            fg_iou_thresh,
            bg_iou_thresh,
            allow_low_quality_matches=True,
        )

        self.fg_bg_sampler = det_utils.BalancedPositiveNegativeSampler(
            batch_size_per_image, positive_fraction
        )
        # used during testing
        self._pre_nms_top_n = pre_nms_top_n
        self._post_nms_top_n = post_nms_top_n
        self.nms_thresh = nms_thresh
        self.min_size = 1e-3

    def pre_nms_top_n(self):
        if self.training:
            return self._pre_nms_top_n['training']
        return self._pre_nms_top_n['testing']

    def post_nms_top_n(self):
        if self.training:
            return self._post_nms_top_n['training']
        return self._post_nms_top_n['testing']

    def assign_targets_to_anchors(self, anchors, targets):
        # type: (List[Tensor], List[Dict[str, Tensor]]) -> Tuple[List[Tensor], List[Tensor]]
        labels = []
        matched_gt_boxes = []
        for anchors_per_image, targets_per_image in zip(anchors, targets):
            gt_boxes = targets_per_image["boxes"]

            if gt_boxes.numel() == 0:
                # Background image (negative example)
                device = anchors_per_image.device
                matched_gt_boxes_per_image = torch.zeros(anchors_per_image.shape, dtype=torch.float32, device=device)
                labels_per_image = torch.zeros((anchors_per_image.shape[0],), dtype=torch.float32, device=device)
            else:
                match_quality_matrix = self.box_similarity(gt_boxes, anchors_per_image)
                matched_idxs = self.proposal_matcher(match_quality_matrix)
                # get the targets corresponding GT for each proposal
                # NB: need to clamp the indices because we can have a single
                # GT in the image, and matched_idxs can be -2, which goes
                # out of bounds
                matched_gt_boxes_per_image = gt_boxes[matched_idxs.clamp(min=0)]

                labels_per_image = matched_idxs >= 0
                labels_per_image = labels_per_image.to(dtype=torch.float32)

                # Background (negative examples)
                bg_indices = matched_idxs == self.proposal_matcher.BELOW_LOW_THRESHOLD
                labels_per_image[bg_indices] = 0.0

                # discard indices that are between thresholds
                inds_to_discard = matched_idxs == self.proposal_matcher.BETWEEN_THRESHOLDS
                labels_per_image[inds_to_discard] = -1.0

            labels.append(labels_per_image)
            matched_gt_boxes.append(matched_gt_boxes_per_image)
        return labels, matched_gt_boxes

    def _get_top_n_idx(self, objectness, num_anchors_per_level):
        # type: (Tensor, List[int]) -> Tensor
        r = []
        offset = 0
        for ob in objectness.split(num_anchors_per_level, 1):
            if torchvision._is_tracing():
                num_anchors, pre_nms_top_n = _onnx_get_num_anchors_and_pre_nms_top_n(ob, self.pre_nms_top_n())
            else:
                num_anchors = ob.shape[1]
                pre_nms_top_n = min(self.pre_nms_top_n(), num_anchors)
            _, top_n_idx = ob.topk(pre_nms_top_n, dim=1)
            r.append(top_n_idx + offset)
            offset += num_anchors
        return torch.cat(r, dim=1)

    def filter_proposals(self, proposals, objectness, image_shapes, num_anchors_per_level):
        # type: (Tensor, Tensor, List[Tuple[int, int]], List[int]) -> Tuple[List[Tensor], List[Tensor]]
        num_images = proposals.shape[0]
        device = proposals.device
        # do not backprop throught objectness
        objectness = objectness.detach()
        objectness = objectness.reshape(num_images, -1)

        levels = [
            torch.full((n,), idx, dtype=torch.int64, device=device)
            for idx, n in enumerate(num_anchors_per_level)
        ]
        levels = torch.cat(levels, 0)
        levels = levels.reshape(1, -1).expand_as(objectness)

        # select top_n boxes independently per level before applying nms
        top_n_idx = self._get_top_n_idx(objectness, num_anchors_per_level)

        image_range = torch.arange(num_images, device=device)
        batch_idx = image_range[:, None]

        objectness = objectness[batch_idx, top_n_idx]
        levels = levels[batch_idx, top_n_idx]
        proposals = proposals[batch_idx, top_n_idx]

        final_boxes = []
        final_scores = []
        for boxes, scores, lvl, img_shape in zip(proposals, objectness, levels, image_shapes):
            boxes = box_ops.clip_boxes_to_image(boxes, img_shape)
            keep = box_ops.remove_small_boxes(boxes, self.min_size)
            boxes, scores, lvl = boxes[keep], scores[keep], lvl[keep]
            # non-maximum suppression, independently done per level
            keep = box_ops.batched_nms(boxes, scores, lvl, self.nms_thresh)
            # keep only topk scoring predictions
            keep = keep[:self.post_nms_top_n()]
            boxes, scores = boxes[keep], scores[keep]
            final_boxes.append(boxes)
            final_scores.append(scores)
        return final_boxes, final_scores

    def compute_loss(self, objectness, pred_bbox_deltas, labels, regression_targets):
        # type: (Tensor, Tensor, List[Tensor], List[Tensor]) -> Tuple[Tensor, Tensor]
        """
        Arguments:
            objectness (Tensor)
            pred_bbox_deltas (Tensor)
            labels (List[Tensor])
            regression_targets (List[Tensor])

        Returns:
            objectness_loss (Tensor)
            box_loss (Tensor)
        """

        sampled_pos_inds, sampled_neg_inds = self.fg_bg_sampler(labels)
        sampled_pos_inds = torch.where(torch.cat(sampled_pos_inds, dim=0))[0]
        sampled_neg_inds = torch.where(torch.cat(sampled_neg_inds, dim=0))[0]

        sampled_inds = torch.cat([sampled_pos_inds, sampled_neg_inds], dim=0)

        objectness = objectness.flatten()

        labels = torch.cat(labels, dim=0)
        regression_targets = torch.cat(regression_targets, dim=0)

        box_loss = det_utils.smooth_l1_loss(
            pred_bbox_deltas[sampled_pos_inds],
            regression_targets[sampled_pos_inds],
            beta=1 / 9,
            size_average=False,
        ) / (sampled_inds.numel())

        objectness_loss = F.binary_cross_entropy_with_logits(
            objectness[sampled_inds], labels[sampled_inds]
        )

        return objectness_loss, box_loss

    def forward(self,
                images,       # type: ImageList
                features,     # type: Dict[str, Tensor]
                targets=None  # type: Optional[List[Dict[str, Tensor]]]
                ):
        # type: (...) -> Tuple[List[Tensor], Dict[str, Tensor]]
        """
        Arguments:
            images (ImageList): images for which we want to compute the predictions
            features (OrderedDict[Tensor]): features computed from the images that are
                used for computing the predictions. Each tensor in the list
                correspond to different feature levels
            targets (List[Dict[Tensor]]): ground-truth boxes present in the image (optional).
                If provided, each element in the dict should contain a field `boxes`,
                with the locations of the ground-truth boxes.

        Returns:
            boxes (List[Tensor]): the predicted boxes from the RPN, one Tensor per
                image.
            losses (Dict[Tensor]): the losses for the model during training. During
                testing, it is an empty dict.
        """
        # RPN uses all feature maps that are available
        features = list(features.values())
        objectness, pred_bbox_deltas = self.head(features)
        anchors = self.anchor_generator(images, features)

        num_images = len(anchors)
        num_anchors_per_level_shape_tensors = [o[0].shape for o in objectness]
        num_anchors_per_level = [s[0] * s[1] * s[2] for s in num_anchors_per_level_shape_tensors]
        objectness, pred_bbox_deltas = \
            concat_box_prediction_layers(objectness, pred_bbox_deltas)
        # apply pred_bbox_deltas to anchors to obtain the decoded proposals
        # note that we detach the deltas because Faster R-CNN do not backprop through
        # the proposals
        proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
        proposals = proposals.view(num_images, -1, 4)
        boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)

        losses = {}
        if self.training:
            assert targets is not None
            labels, matched_gt_boxes = self.assign_targets_to_anchors(anchors, targets)
            regression_targets = self.box_coder.encode(matched_gt_boxes, anchors)
            loss_objectness, loss_rpn_box_reg = self.compute_loss(
                objectness, pred_bbox_deltas, labels, regression_targets)
            losses = {
                "loss_objectness": loss_objectness,
                "loss_rpn_box_reg": loss_rpn_box_reg,
            }
        return boxes, losses

主要看__forward__中的计算流，RPN的完整过程

把输入的特征features输入到RPNHead（self.head）中，输出object/non-object分类分值（objectness）和bbox回归数值（pred_bbox_deltas）；
self.anchor_generator为当前输入的图像和featuremap生成anchors；
self.box_coder.decode把bbox回归数值pred_bbox_deltas算到锚框anchors上，得到预测出的候选框proposals；
计算出的proposals可能很多且相互密集重叠，那么就通过self.filter_proposals做一遍过滤，输出候选框proposals和与之对应的分值scores；
如果是训练时，当然在RPN阶段需要根据预测出的proposals与候选框真值之间的误差来计算损失。

AnchorGenerator

AnchorGenerator在torchvision.models.detection.anchor_utils中实现，其作用是根据预定义的anchor的sizes和aspect_ratios，针对图像到featuremap的尺寸比例，计算feature map对应的anchors。

class AnchorGenerator(nn.Module):
    """
    Module that generates anchors for a set of feature maps and
    image sizes.

    The module support computing anchors at multiple sizes and aspect ratios
    per feature map. This module assumes aspect ratio = height / width for
    each anchor.

    sizes and aspect_ratios should have the same number of elements, and it should
    correspond to the number of feature maps.

    sizes[i] and aspect_ratios[i] can have an arbitrary number of elements,
    and AnchorGenerator will output a set of sizes[i] * aspect_ratios[i] anchors
    per spatial location for feature map i.

    Arguments:
        sizes (Tuple[Tuple[int]]):
        aspect_ratios (Tuple[Tuple[float]]):
    """

    __annotations__ = {
        "cell_anchors": Optional[List[torch.Tensor]],
        "_cache": Dict[str, List[torch.Tensor]]
    }

    def __init__(
        self,
        sizes=((128, 256, 512),),
        aspect_ratios=((0.5, 1.0, 2.0),),
    ):
        super(AnchorGenerator, self).__init__()

        if not isinstance(sizes[0], (list, tuple)):
            # TODO change this
            sizes = tuple((s,) for s in sizes)
        if not isinstance(aspect_ratios[0], (list, tuple)):
            aspect_ratios = (aspect_ratios,) * len(sizes)

        assert len(sizes) == len(aspect_ratios)

        self.sizes = sizes
        self.aspect_ratios = aspect_ratios
        self.cell_anchors = None
        self._cache = {}

    # TODO: https://github.com/pytorch/pytorch/issues/26792
    # For every (aspect_ratios, scales) combination, output a zero-centered anchor with those values.
    # (scales, aspect_ratios) are usually an element of zip(self.scales, self.aspect_ratios)
    # This method assumes aspect ratio = height / width for an anchor.
    def generate_anchors(self, scales, aspect_ratios, dtype=torch.float32, device="cpu"):
        # type: (List[int], List[float], int, Device) -> Tensor  # noqa: F821
        scales = torch.as_tensor(scales, dtype=dtype, device=device)
        aspect_ratios = torch.as_tensor(aspect_ratios, dtype=dtype, device=device)
        h_ratios = torch.sqrt(aspect_ratios)
        w_ratios = 1 / h_ratios

        ws = (w_ratios[:, None] * scales[None, :]).view(-1)
        hs = (h_ratios[:, None] * scales[None, :]).view(-1)

        base_anchors = torch.stack([-ws, -hs, ws, hs], dim=1) / 2
        return base_anchors.round()

    def set_cell_anchors(self, dtype, device):
        # type: (int, Device) -> None  # noqa: F821
        if self.cell_anchors is not None:
            cell_anchors = self.cell_anchors
            assert cell_anchors is not None
            # suppose that all anchors have the same device
            # which is a valid assumption in the current state of the codebase
            if cell_anchors[0].device == device:
                return

        cell_anchors = [
            self.generate_anchors(
                sizes,
                aspect_ratios,
                dtype,
                device
            )
            for sizes, aspect_ratios in zip(self.sizes, self.aspect_ratios)
        ]
        self.cell_anchors = cell_anchors

    def num_anchors_per_location(self):
        return [len(s) * len(a) for s, a in zip(self.sizes, self.aspect_ratios)]

    # For every combination of (a, (g, s), i) in (self.cell_anchors, zip(grid_sizes, strides), 0:2),
    # output g[i] anchors that are s[i] distance apart in direction i, with the same dimensions as a.
    def grid_anchors(self, grid_sizes, strides):
        # type: (List[List[int]], List[List[Tensor]]) -> List[Tensor]
        anchors = []
        cell_anchors = self.cell_anchors
        assert cell_anchors is not None
        assert len(grid_sizes) == len(strides) == len(cell_anchors)

        for size, stride, base_anchors in zip(
            grid_sizes, strides, cell_anchors
        ):
            grid_height, grid_width = size
            stride_height, stride_width = stride
            device = base_anchors.device

            # For output anchor, compute [x_center, y_center, x_center, y_center]
            shifts_x = torch.arange(
                0, grid_width, dtype=torch.float32, device=device
            ) * stride_width
            shifts_y = torch.arange(
                0, grid_height, dtype=torch.float32, device=device
            ) * stride_height
            shift_y, shift_x = torch.meshgrid(shifts_y, shifts_x)
            shift_x = shift_x.reshape(-1)
            shift_y = shift_y.reshape(-1)
            shifts = torch.stack((shift_x, shift_y, shift_x, shift_y), dim=1)

            # For every (base anchor, output anchor) pair,
            # offset each zero-centered base anchor by the center of the output anchor.
            anchors.append(
                (shifts.view(-1, 1, 4) + base_anchors.view(1, -1, 4)).reshape(-1, 4)
            )

        return anchors

    def cached_grid_anchors(self, grid_sizes, strides):
        # type: (List[List[int]], List[List[Tensor]]) -> List[Tensor]
        key = str(grid_sizes) + str(strides)
        if key in self._cache:
            return self._cache[key]
        anchors = self.grid_anchors(grid_sizes, strides)
        self._cache[key] = anchors
        return anchors

    def forward(self, image_list, feature_maps):
        # type: (ImageList, List[Tensor]) -> List[Tensor]
        grid_sizes = list([feature_map.shape[-2:] for feature_map in feature_maps])
        image_size = image_list.tensors.shape[-2:]
        dtype, device = feature_maps[0].dtype, feature_maps[0].device
        strides = [[torch.tensor(image_size[0] // g[0], dtype=torch.int64, device=device),
                    torch.tensor(image_size[1] // g[1], dtype=torch.int64, device=device)] for g in grid_sizes]
        self.set_cell_anchors(dtype, device)
        anchors_over_all_feature_maps = self.cached_grid_anchors(grid_sizes, strides)
        anchors = torch.jit.annotate(List[List[torch.Tensor]], [])
        for i, (image_height, image_width) in enumerate(image_list.image_sizes):
            anchors_in_image = []
            for anchors_per_feature_map in anchors_over_all_feature_maps:
                anchors_in_image.append(anchors_per_feature_map)
            anchors.append(anchors_in_image)
        anchors = [torch.cat(anchors_per_image) for anchors_per_image in anchors]
        # Clear the cache in case that memory leaks.
        self._cache.clear()
        return anchors

这部分代码也有点长，主要原理也还是看__forward__计算流即可，这里面有一系列预备的计算，随后就是两层的for循环，表示：每一个图片可以传入$n$个（尺寸不同的）featuremap，每一个feature map上都有$k_i$个anchor，那么每个图片就有$K = \sum_{i=1}^{n}k_i$个anchors。其中，每个滑窗位置上有$A$个anchor，第$i$个feature map上有$L_i$个滑窗位置，则该层feature map上有$k_i = A L_i$个anchors。

两层循环，外层遍历images，内层遍历featuremaps，由此输出所有图片的feature map上的anchors。

在具体实现中，AnchorGenerator：

self.set_cell_anchors函数负责为每一层featuremap生成self.cell_anchors，这个CellAnchors的尺寸基于的是输入图片tensor的尺寸，；
self.cached_grid_anchors函数内会进一步调用self.grid_anchors函数，该函数负责根据featuremap的网格尺寸以及该featuremap相较于输入图片tensor的步长，计算出anchors_over_all_feature_maps，它的尺寸则是基于输入图片tensor的尺寸。
双重for循环，输入$N$个图片，就相应地将anchors复制出$N$份。
最后torch.cat拉平每个图片上不同feature map上的所有$K$个anchors，形成一个长度为$N$的list，每个元素是$K \times 4$的anchors张量。

对应到例子，torchvision实现默认为：

3种aspect ratio，分别为0.5, 1.0, 2.0；
每层feature map对应1个scale，5层feature map分别为16, 32, 64, 128,256。

因此，cell_anchors中，每层feature map都是3个anchorcells：

'0': shape[3, 4]
'1': shape[3, 4]
'2': shape[3, 4]
'3': shape[3, 4]
'4': shape[3, 4]

结合例子来算，把cell_anchors算到输入图像张量的每一个滑窗位置上，就可以算出所有位置上的所有anchors_over_all_feature_maps：

'0': shape[208896, 4]，$208896 = 256 \times 272 \times 3$；
'1': shape[52224, 4]，$52224= 128 \times 136 \times 3$；
'2': shape[13056, 4]，$13056= 64 \times 68 \times 3$；
'3': shape[3264, 4]，$3264 =32 \times 34 \times 3$；
'4': shape[816, 4]，$816 =16 \times 17 \times 3$；

最后返回的anchors会为输入的每个图片复制一份，并通过torch.cat拉平：

shape[278256, 4], $278256 = 208896 +52224 + 13056 + 3264 + 816$；
shape[278256, 4], $278256 = 208896 +52224 + 13056 + 3264 + 816$；

RPNHead

RPNHead在torchvision.models.detection.rpn包中实现。RPNHead被用于以滑窗的形式在特征提取出的featuremap上滑动并计算每个anchor的bbox回归值和object/non-object二分类。

class RPNHead(nn.Module):
    """
    Adds a simple RPN Head with classification and regression heads

    Arguments:
        in_channels (int): number of channels of the input feature
        num_anchors (int): number of anchors to be predicted
    """

    def __init__(self, in_channels, num_anchors):
        super(RPNHead, self).__init__()
        self.conv = nn.Conv2d(
            in_channels, in_channels, kernel_size=3, stride=1, padding=1
        )
        self.cls_logits = nn.Conv2d(in_channels, num_anchors, kernel_size=1, stride=1)
        self.bbox_pred = nn.Conv2d(
            in_channels, num_anchors * 4, kernel_size=1, stride=1
        )

        for layer in self.children():
            torch.nn.init.normal_(layer.weight, std=0.01)
            torch.nn.init.constant_(layer.bias, 0)

    def forward(self, x):
        # type: (List[Tensor]) -> Tuple[List[Tensor], List[Tensor]]
        logits = []
        bbox_reg = []
        for feature in x:
            t = F.relu(self.conv(feature))
            logits.append(self.cls_logits(t))
            bbox_reg.append(self.bbox_pred(t))
        return logits, bbox_reg

可以看到，RPNHead的结构不复杂，就是三个卷积：

self.conv：3×3卷积，对输入的featuremap做卷积处理；
self.cls_logits：1×1卷积，对处理后的feature map $t$做卷积，取得object/non-object的分类数值；
self.bbox_reg：1×1卷积，对处理后的feature map $t$做卷积，取得bbox坐标值的回归数值。

在forward前向传播计算的时候，输入的x是一个List[Tensor]，即FPN的输出。值得注意的是，for循环遍历的并不是每一张图片，而是FPN输出的每一层特征。

在本例中，RPNHead的两个卷积分支输出了两个List[Tensor]：

logits (objectness):

'0': shape[2, 3, 252, 272]；
'1': shape[2, 3, 128, 136]；
'2': shape[2, 3, 64, 68]；
'3': shape[2, 3, 32, 34]；
'4': shape[2, 3, 16, 17]；

bbox_regs (pred_bbox_deltas):

'0': shape[2, 12, 252, 272]；
'1': shape[2, 12, 128, 136]；
'2': shape[2, 12, 64, 68]；
'3': shape[2, 12, 32, 34]；
'4': shape[2, 12, 16, 17]；

因为每个滑窗位置对应三种ratios，即3个anchors，所以logits是3个值，而bbox_regs因为坐标乘4，所以是12个值。

BoxCoder

BoxCoder在torchvision.models.detection._utils中实现。

class BoxCoder(object):
    """
    This class encodes and decodes a set of bounding boxes into
    the representation used for training the regressors.
    """

    def __init__(self, weights, bbox_xform_clip=math.log(1000. / 16)):
        # type: (Tuple[float, float, float, float], float) -> None
        """
        Arguments:
            weights (4-element tuple)
            bbox_xform_clip (float)
        """
        self.weights = weights
        self.bbox_xform_clip = bbox_xform_clip

    def encode(self, reference_boxes, proposals):
        # type: (List[Tensor], List[Tensor]) -> List[Tensor]
        boxes_per_image = [len(b) for b in reference_boxes]
        reference_boxes = torch.cat(reference_boxes, dim=0)
        proposals = torch.cat(proposals, dim=0)
        targets = self.encode_single(reference_boxes, proposals)
        return targets.split(boxes_per_image, 0)

    def encode_single(self, reference_boxes, proposals):
        """
        Encode a set of proposals with respect to some
        reference boxes

        Arguments:
            reference_boxes (Tensor): reference boxes
            proposals (Tensor): boxes to be encoded
        """
        dtype = reference_boxes.dtype
        device = reference_boxes.device
        weights = torch.as_tensor(self.weights, dtype=dtype, device=device)
        targets = encode_boxes(reference_boxes, proposals, weights)

        return targets

    def decode(self, rel_codes, boxes):
        # type: (Tensor, List[Tensor]) -> Tensor
        assert isinstance(boxes, (list, tuple))
        assert isinstance(rel_codes, torch.Tensor)
        boxes_per_image = [b.size(0) for b in boxes]
        concat_boxes = torch.cat(boxes, dim=0)
        box_sum = 0
        for val in boxes_per_image:
            box_sum += val
        pred_boxes = self.decode_single(
            rel_codes.reshape(box_sum, -1), concat_boxes
        )
        return pred_boxes.reshape(box_sum, -1, 4)

    def decode_single(self, rel_codes, boxes):
        """
        From a set of original boxes and encoded relative box offsets,
        get the decoded boxes.

        Arguments:
            rel_codes (Tensor): encoded boxes
            boxes (Tensor): reference boxes.
        """

        boxes = boxes.to(rel_codes.dtype)

        widths = boxes[:, 2] - boxes[:, 0]
        heights = boxes[:, 3] - boxes[:, 1]
        ctr_x = boxes[:, 0] + 0.5 * widths
        ctr_y = boxes[:, 1] + 0.5 * heights

        wx, wy, ww, wh = self.weights
        dx = rel_codes[:, 0::4] / wx
        dy = rel_codes[:, 1::4] / wy
        dw = rel_codes[:, 2::4] / ww
        dh = rel_codes[:, 3::4] / wh

        # Prevent sending too large values into torch.exp()
        dw = torch.clamp(dw, max=self.bbox_xform_clip)
        dh = torch.clamp(dh, max=self.bbox_xform_clip)

        pred_ctr_x = dx * widths[:, None] + ctr_x[:, None]
        pred_ctr_y = dy * heights[:, None] + ctr_y[:, None]
        pred_w = torch.exp(dw) * widths[:, None]
        pred_h = torch.exp(dh) * heights[:, None]

        pred_boxes1 = pred_ctr_x - torch.tensor(0.5, dtype=pred_ctr_x.dtype, device=pred_w.device) * pred_w
        pred_boxes2 = pred_ctr_y - torch.tensor(0.5, dtype=pred_ctr_y.dtype, device=pred_h.device) * pred_h
        pred_boxes3 = pred_ctr_x + torch.tensor(0.5, dtype=pred_ctr_x.dtype, device=pred_w.device) * pred_w
        pred_boxes4 = pred_ctr_y + torch.tensor(0.5, dtype=pred_ctr_y.dtype, device=pred_h.device) * pred_h
        pred_boxes = torch.stack((pred_boxes1, pred_boxes2, pred_boxes3, pred_boxes4), dim=2).flatten(1)
        return pred_boxes

RPN中self.box_coder使用BoxCoder作为bbox的编解码器：

1	proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)

通过BoxCoder实例，将RPNHead回归出的pred_bbox_deltas与RPN的锚框anchors做解码计算，把回归出的偏移值加到基准anchors位置上，解码输出候选框proposals。

在本例中，RPN的forward对解码出的原始proposals做了维度整理proposals = proposals.view(num_images, -1, 4)，得到的proposals是：

shape[2, 278256, 4]

filter_proposals

filter_proposals是一个对RPNHead生成的候选框proposals的过滤操作，在RPN类RegionProposalNetwork中作为成员函数实现。

class RegionProposalNetwork(torch.nn.Module):
    
    # ...
    
def filter_proposals(self, proposals, objectness, image_shapes, num_anchors_per_level):
        # type: (Tensor, Tensor, List[Tuple[int, int]], List[int]) -> Tuple[List[Tensor], List[Tensor]]
        num_images = proposals.shape[0]
        device = proposals.device
        # do not backprop throught objectness
        objectness = objectness.detach()
        objectness = objectness.reshape(num_images, -1)

        levels = [
            torch.full((n,), idx, dtype=torch.int64, device=device)
            for idx, n in enumerate(num_anchors_per_level)
        ]
        levels = torch.cat(levels, 0)
        levels = levels.reshape(1, -1).expand_as(objectness)

        # select top_n boxes independently per level before applying nms
        top_n_idx = self._get_top_n_idx(objectness, num_anchors_per_level)

        image_range = torch.arange(num_images, device=device)
        batch_idx = image_range[:, None]

        objectness = objectness[batch_idx, top_n_idx]
        levels = levels[batch_idx, top_n_idx]
        proposals = proposals[batch_idx, top_n_idx]

        final_boxes = []
        final_scores = []
        for boxes, scores, lvl, img_shape in zip(proposals, objectness, levels, image_shapes):
            boxes = box_ops.clip_boxes_to_image(boxes, img_shape)
            keep = box_ops.remove_small_boxes(boxes, self.min_size)
            boxes, scores, lvl = boxes[keep], scores[keep], lvl[keep]
            # non-maximum suppression, independently done per level
            keep = box_ops.batched_nms(boxes, scores, lvl, self.nms_thresh)
            # keep only topk scoring predictions
            keep = keep[:self.post_nms_top_n()]
            boxes, scores = boxes[keep], scores[keep]
            final_boxes.append(boxes)
            final_scores.append(scores)
        return final_boxes, final_scores

对proposals的过滤操作分几个阶段实现：

首先，根据objectness分值和num_anchors_per_level来在每层选出top_n_idx(pre_nms_top_n)用于在NMS前先筛选一下proposals；
此后，进入for循环，遍历batch中的每张图片，
1. 先做一些裁边界、去小框的处理；
2. 然后再做（类间）NMS；
3. 保留当前图片所有结果的post_nms_top_n的目标作为返回结果。

最后将这么多筛选操作筛选出的final_boxes和final_scores返回（boxes是筛选后的proposals，scores是筛选后的objectness）。

在本例中，有两张图片，每张图片上有278256个anchors，因此产生278256个proposals和objectness，进过筛选处理后：

final_boxes：
1. shape[1000, 4]
2. shape[1000, 4]
final_scores:
1. shape 1000
2. shape 1000

因为FasterRCNN中默认值rpn_post_nms_top_n_test=1000，所以在eval模式（即test,infer情况）下，例子中的两张图片都各筛选出了top-1000个boxes。

RoIHeads

RoIHeads在torchvision.models.detection.roi_heads包中实现。

class RoIHeads(torch.nn.Module):
    __annotations__ = {
        'box_coder': det_utils.BoxCoder,
        'proposal_matcher': det_utils.Matcher,
        'fg_bg_sampler': det_utils.BalancedPositiveNegativeSampler,
    }

    def __init__(self,
                 box_roi_pool,
                 box_head,
                 box_predictor,
                 # Faster R-CNN training
                 fg_iou_thresh, bg_iou_thresh,
                 batch_size_per_image, positive_fraction,
                 bbox_reg_weights,
                 # Faster R-CNN inference
                 score_thresh,
                 nms_thresh,
                 detections_per_img,
                 # Mask
                 mask_roi_pool=None,
                 mask_head=None,
                 mask_predictor=None,
                 keypoint_roi_pool=None,
                 keypoint_head=None,
                 keypoint_predictor=None,
                 ):
        super(RoIHeads, self).__init__()

        self.box_similarity = box_ops.box_iou
        # assign ground-truth boxes for each proposal
        self.proposal_matcher = det_utils.Matcher(
            fg_iou_thresh,
            bg_iou_thresh,
            allow_low_quality_matches=False)

        self.fg_bg_sampler = det_utils.BalancedPositiveNegativeSampler(
            batch_size_per_image,
            positive_fraction)

        if bbox_reg_weights is None:
            bbox_reg_weights = (10., 10., 5., 5.)
        self.box_coder = det_utils.BoxCoder(bbox_reg_weights)

        self.box_roi_pool = box_roi_pool
        self.box_head = box_head
        self.box_predictor = box_predictor

        self.score_thresh = score_thresh
        self.nms_thresh = nms_thresh
        self.detections_per_img = detections_per_img

        self.mask_roi_pool = mask_roi_pool
        self.mask_head = mask_head
        self.mask_predictor = mask_predictor

        self.keypoint_roi_pool = keypoint_roi_pool
        self.keypoint_head = keypoint_head
        self.keypoint_predictor = keypoint_predictor

    def has_mask(self):
        if self.mask_roi_pool is None:
            return False
        if self.mask_head is None:
            return False
        if self.mask_predictor is None:
            return False
        return True

    def has_keypoint(self):
        if self.keypoint_roi_pool is None:
            return False
        if self.keypoint_head is None:
            return False
        if self.keypoint_predictor is None:
            return False
        return True

    def assign_targets_to_proposals(self, proposals, gt_boxes, gt_labels):
        # type: (List[Tensor], List[Tensor], List[Tensor]) -> Tuple[List[Tensor], List[Tensor]]
        matched_idxs = []
        labels = []
        for proposals_in_image, gt_boxes_in_image, gt_labels_in_image in zip(proposals, gt_boxes, gt_labels):

            if gt_boxes_in_image.numel() == 0:
                # Background image
                device = proposals_in_image.device
                clamped_matched_idxs_in_image = torch.zeros(
                    (proposals_in_image.shape[0],), dtype=torch.int64, device=device
                )
                labels_in_image = torch.zeros(
                    (proposals_in_image.shape[0],), dtype=torch.int64, device=device
                )
            else:
                #  set to self.box_similarity when https://github.com/pytorch/pytorch/issues/27495 lands
                match_quality_matrix = box_ops.box_iou(gt_boxes_in_image, proposals_in_image)
                matched_idxs_in_image = self.proposal_matcher(match_quality_matrix)

                clamped_matched_idxs_in_image = matched_idxs_in_image.clamp(min=0)

                labels_in_image = gt_labels_in_image[clamped_matched_idxs_in_image]
                labels_in_image = labels_in_image.to(dtype=torch.int64)

                # Label background (below the low threshold)
                bg_inds = matched_idxs_in_image == self.proposal_matcher.BELOW_LOW_THRESHOLD
                labels_in_image[bg_inds] = 0

                # Label ignore proposals (between low and high thresholds)
                ignore_inds = matched_idxs_in_image == self.proposal_matcher.BETWEEN_THRESHOLDS
                labels_in_image[ignore_inds] = -1  # -1 is ignored by sampler

            matched_idxs.append(clamped_matched_idxs_in_image)
            labels.append(labels_in_image)
        return matched_idxs, labels

    def subsample(self, labels):
        # type: (List[Tensor]) -> List[Tensor]
        sampled_pos_inds, sampled_neg_inds = self.fg_bg_sampler(labels)
        sampled_inds = []
        for img_idx, (pos_inds_img, neg_inds_img) in enumerate(
            zip(sampled_pos_inds, sampled_neg_inds)
        ):
            img_sampled_inds = torch.where(pos_inds_img | neg_inds_img)[0]
            sampled_inds.append(img_sampled_inds)
        return sampled_inds

    def add_gt_proposals(self, proposals, gt_boxes):
        # type: (List[Tensor], List[Tensor]) -> List[Tensor]
        proposals = [
            torch.cat((proposal, gt_box))
            for proposal, gt_box in zip(proposals, gt_boxes)
        ]

        return proposals

    def check_targets(self, targets):
        # type: (Optional[List[Dict[str, Tensor]]]) -> None
        assert targets is not None
        assert all(["boxes" in t for t in targets])
        assert all(["labels" in t for t in targets])
        if self.has_mask():
            assert all(["masks" in t for t in targets])

    def select_training_samples(self,
                                proposals,  # type: List[Tensor]
                                targets     # type: Optional[List[Dict[str, Tensor]]]
                                ):
        # type: (...) -> Tuple[List[Tensor], List[Tensor], List[Tensor], List[Tensor]]
        self.check_targets(targets)
        assert targets is not None
        dtype = proposals[0].dtype
        device = proposals[0].device

        gt_boxes = [t["boxes"].to(dtype) for t in targets]
        gt_labels = [t["labels"] for t in targets]

        # append ground-truth bboxes to propos
        proposals = self.add_gt_proposals(proposals, gt_boxes)

        # get matching gt indices for each proposal
        matched_idxs, labels = self.assign_targets_to_proposals(proposals, gt_boxes, gt_labels)
        # sample a fixed proportion of positive-negative proposals
        sampled_inds = self.subsample(labels)
        matched_gt_boxes = []
        num_images = len(proposals)
        for img_id in range(num_images):
            img_sampled_inds = sampled_inds[img_id]
            proposals[img_id] = proposals[img_id][img_sampled_inds]
            labels[img_id] = labels[img_id][img_sampled_inds]
            matched_idxs[img_id] = matched_idxs[img_id][img_sampled_inds]

            gt_boxes_in_image = gt_boxes[img_id]
            if gt_boxes_in_image.numel() == 0:
                gt_boxes_in_image = torch.zeros((1, 4), dtype=dtype, device=device)
            matched_gt_boxes.append(gt_boxes_in_image[matched_idxs[img_id]])

        regression_targets = self.box_coder.encode(matched_gt_boxes, proposals)
        return proposals, matched_idxs, labels, regression_targets

    def postprocess_detections(self,
                               class_logits,    # type: Tensor
                               box_regression,  # type: Tensor
                               proposals,       # type: List[Tensor]
                               image_shapes     # type: List[Tuple[int, int]]
                               ):
        # type: (...) -> Tuple[List[Tensor], List[Tensor], List[Tensor]]
        device = class_logits.device
        num_classes = class_logits.shape[-1]

        boxes_per_image = [boxes_in_image.shape[0] for boxes_in_image in proposals]
        pred_boxes = self.box_coder.decode(box_regression, proposals)

        pred_scores = F.softmax(class_logits, -1)

        pred_boxes_list = pred_boxes.split(boxes_per_image, 0)
        pred_scores_list = pred_scores.split(boxes_per_image, 0)

        all_boxes = []
        all_scores = []
        all_labels = []
        for boxes, scores, image_shape in zip(pred_boxes_list, pred_scores_list, image_shapes):
            boxes = box_ops.clip_boxes_to_image(boxes, image_shape)

            # create labels for each prediction
            labels = torch.arange(num_classes, device=device)
            labels = labels.view(1, -1).expand_as(scores)

            # remove predictions with the background label
            boxes = boxes[:, 1:]
            scores = scores[:, 1:]
            labels = labels[:, 1:]

            # batch everything, by making every class prediction be a separate instance
            boxes = boxes.reshape(-1, 4)
            scores = scores.reshape(-1)
            labels = labels.reshape(-1)

            # remove low scoring boxes
            inds = torch.where(scores > self.score_thresh)[0]
            boxes, scores, labels = boxes[inds], scores[inds], labels[inds]

            # remove empty boxes
            keep = box_ops.remove_small_boxes(boxes, min_size=1e-2)
            boxes, scores, labels = boxes[keep], scores[keep], labels[keep]

            # non-maximum suppression, independently done per class
            keep = box_ops.batched_nms(boxes, scores, labels, self.nms_thresh)
            # keep only topk scoring predictions
            keep = keep[:self.detections_per_img]
            boxes, scores, labels = boxes[keep], scores[keep], labels[keep]

            all_boxes.append(boxes)
            all_scores.append(scores)
            all_labels.append(labels)

        return all_boxes, all_scores, all_labels

    def forward(self,
                features,      # type: Dict[str, Tensor]
                proposals,     # type: List[Tensor]
                image_shapes,  # type: List[Tuple[int, int]]
                targets=None   # type: Optional[List[Dict[str, Tensor]]]
                ):
        # type: (...) -> Tuple[List[Dict[str, Tensor]], Dict[str, Tensor]]
        """
        Arguments:
            features (List[Tensor])
            proposals (List[Tensor[N, 4]])
            image_shapes (List[Tuple[H, W]])
            targets (List[Dict])
        """
        if targets is not None:
            for t in targets:
                # TODO: https://github.com/pytorch/pytorch/issues/26731
                floating_point_types = (torch.float, torch.double, torch.half)
                assert t["boxes"].dtype in floating_point_types, 'target boxes must of float type'
                assert t["labels"].dtype == torch.int64, 'target labels must of int64 type'
                if self.has_keypoint():
                    assert t["keypoints"].dtype == torch.float32, 'target keypoints must of float type'

        if self.training:
            proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets)
        else:
            labels = None
            regression_targets = None
            matched_idxs = None

        box_features = self.box_roi_pool(features, proposals, image_shapes)
        box_features = self.box_head(box_features)
        class_logits, box_regression = self.box_predictor(box_features)

        result = torch.jit.annotate(List[Dict[str, torch.Tensor]], [])
        losses = {}
        if self.training:
            assert labels is not None and regression_targets is not None
            loss_classifier, loss_box_reg = fastrcnn_loss(
                class_logits, box_regression, labels, regression_targets)
            losses = {
                "loss_classifier": loss_classifier,
                "loss_box_reg": loss_box_reg
            }
        else:
            boxes, scores, labels = self.postprocess_detections(class_logits, box_regression, proposals, image_shapes)
            num_images = len(boxes)
            for i in range(num_images):
                result.append(
                    {
                        "boxes": boxes[i],
                        "labels": labels[i],
                        "scores": scores[i],
                    }
                )

        if self.has_mask():
            mask_proposals = [p["boxes"] for p in result]
            if self.training:
                assert matched_idxs is not None
                # during training, only focus on positive boxes
                num_images = len(proposals)
                mask_proposals = []
                pos_matched_idxs = []
                for img_id in range(num_images):
                    pos = torch.where(labels[img_id] > 0)[0]
                    mask_proposals.append(proposals[img_id][pos])
                    pos_matched_idxs.append(matched_idxs[img_id][pos])
            else:
                pos_matched_idxs = None

            if self.mask_roi_pool is not None:
                mask_features = self.mask_roi_pool(features, mask_proposals, image_shapes)
                mask_features = self.mask_head(mask_features)
                mask_logits = self.mask_predictor(mask_features)
            else:
                mask_logits = torch.tensor(0)
                raise Exception("Expected mask_roi_pool to be not None")

            loss_mask = {}
            if self.training:
                assert targets is not None
                assert pos_matched_idxs is not None
                assert mask_logits is not None

                gt_masks = [t["masks"] for t in targets]
                gt_labels = [t["labels"] for t in targets]
                rcnn_loss_mask = maskrcnn_loss(
                    mask_logits, mask_proposals,
                    gt_masks, gt_labels, pos_matched_idxs)
                loss_mask = {
                    "loss_mask": rcnn_loss_mask
                }
            else:
                labels = [r["labels"] for r in result]
                masks_probs = maskrcnn_inference(mask_logits, labels)
                for mask_prob, r in zip(masks_probs, result):
                    r["masks"] = mask_prob

            losses.update(loss_mask)

        # keep none checks in if conditional so torchscript will conditionally
        # compile each branch
        if self.keypoint_roi_pool is not None and self.keypoint_head is not None \
                and self.keypoint_predictor is not None:
            keypoint_proposals = [p["boxes"] for p in result]
            if self.training:
                # during training, only focus on positive boxes
                num_images = len(proposals)
                keypoint_proposals = []
                pos_matched_idxs = []
                assert matched_idxs is not None
                for img_id in range(num_images):
                    pos = torch.where(labels[img_id] > 0)[0]
                    keypoint_proposals.append(proposals[img_id][pos])
                    pos_matched_idxs.append(matched_idxs[img_id][pos])
            else:
                pos_matched_idxs = None

            keypoint_features = self.keypoint_roi_pool(features, keypoint_proposals, image_shapes)
            keypoint_features = self.keypoint_head(keypoint_features)
            keypoint_logits = self.keypoint_predictor(keypoint_features)

            loss_keypoint = {}
            if self.training:
                assert targets is not None
                assert pos_matched_idxs is not None

                gt_keypoints = [t["keypoints"] for t in targets]
                rcnn_loss_keypoint = keypointrcnn_loss(
                    keypoint_logits, keypoint_proposals,
                    gt_keypoints, pos_matched_idxs)
                loss_keypoint = {
                    "loss_keypoint": rcnn_loss_keypoint
                }
            else:
                assert keypoint_logits is not None
                assert keypoint_proposals is not None

                keypoints_probs, kp_scores = keypointrcnn_inference(keypoint_logits, keypoint_proposals)
                for keypoint_prob, kps, r in zip(keypoints_probs, kp_scores, result):
                    r["keypoints"] = keypoint_prob
                    r["keypoints_scores"] = kps

            losses.update(loss_keypoint)

        return result, losses

主要看__forward__函数的实现，虽然很长，但是如果只考虑FasterR-CNN需要的部分（不考虑用于MaskR-CNN的图像分割分支），其实可以概括为：

def forward(self,
            features,      # type: Dict[str, Tensor]
            proposals,     # type: List[Tensor]
            image_shapes,  # type: List[Tuple[int, int]]
            targets=None   # type: Optional[List[Dict[str, Tensor]]]
            ):
    # 1. RoI Pool (or RoI Align)
    box_features = self.box_roi_pool(features, proposals, image_shapes)
    # 2. MLP Head
    box_features = self.box_head(box_features)
    # 3. Predictor
    class_logits, box_regression = self.box_predictor(box_features)

    result = torch.jit.annotate(List[Dict[str, torch.Tensor]], [])
    losses = {}
    if self.training:
        assert labels is not None and regression_targets is not None
        loss_classifier, loss_box_reg = fastrcnn_loss(
            class_logits, box_regression, labels, regression_targets)
        losses = {
            "loss_classifier": loss_classifier,
            "loss_box_reg": loss_box_reg
        }
    else:
        # 4. Postprocess Detections
        boxes, scores, labels = self.postprocess_detections(class_logits, box_regression, proposals, image_shapes)
        num_images = len(boxes)
        for i in range(num_images):
            result.append(
                {
                    "boxes": boxes[i],
                    "labels": labels[i],
                    "scores": scores[i],
                }
            )

    return result, losses

Faster R-CNN的RoIHeads主要包含几个步骤：

RoIPool：由box_features = self.box_roi_pool(features, proposals, image_shapes)执行，FasterR-CNN的RoIPool的具体实现是torchvision.ops.poolers包中的MultiScaleRoIAlign类。因为目标的形状不尽相同，所以涉及到的特征窗口就不尽相同。RoIPool的目的在于通过把尺寸不定的RoIwindow划分为固定的网格做池化，来把输入的变长的RoI特征池化为定长的特征输出，方便后续的特征处理。
MLPHead：由box_features = self.box_head(box_features)执行，FasterR-CNN的MLPHead的具体实现是torchvision.models.detection.faster_rcnn中的TwoMLPHead类。MLPHead承接RoIPool池化出的定长特征向量，并通过MLP做非线性计算，输出最终特征用于后续的任务（分类、回归等）。
Predictor：由class_logits, box_regression = self.box_predictor(box_features)执行，FasterR-CNN的Predictor的具体实现是torchvision.models.detection.faster_rcnn中的FastRCNNPredictor类。上一步MLP操作输出的特征作为最后的特征，交给Predictor去做具体任务的预测，例如：目标分类，bbox位置和尺寸值的回归预测。
PostprocessDetections：由boxes, scores, labels = self.postprocess_detections(class_logits, box_regression, proposals, image_shapes)，该函数是RoIHeads类的一个成员函数。

MultiScaleRoIAlign

torchvision采用torchvision.ops.poolers包中的MultiScaleRoIAlign作为FasterR-CNN的RoI Pool的实现。

class MultiScaleRoIAlign(nn.Module):
    """
    Multi-scale RoIAlign pooling, which is useful for detection with or without FPN.

    It infers the scale of the pooling via the heuristics present in the FPN paper.

    Arguments:
        featmap_names (List[str]): the names of the feature maps that will be used
            for the pooling.
        output_size (List[Tuple[int, int]] or List[int]): output size for the pooled region
        sampling_ratio (int): sampling ratio for ROIAlign

    Examples::

        >>> m = torchvision.ops.MultiScaleRoIAlign(['feat1', 'feat3'], 3, 2)
        >>> i = OrderedDict()
        >>> i['feat1'] = torch.rand(1, 5, 64, 64)
        >>> i['feat2'] = torch.rand(1, 5, 32, 32)  # this feature won't be used in the pooling
        >>> i['feat3'] = torch.rand(1, 5, 16, 16)
        >>> # create some random bounding boxes
        >>> boxes = torch.rand(6, 4) * 256; boxes[:, 2:] += boxes[:, :2]
        >>> # original image size, before computing the feature maps
        >>> image_sizes = [(512, 512)]
        >>> output = m(i, [boxes], image_sizes)
        >>> print(output.shape)
        >>> torch.Size([6, 5, 3, 3])

    """

    __annotations__ = {
        'scales': Optional[List[float]],
        'map_levels': Optional[LevelMapper]
    }

    def __init__(
        self,
        featmap_names: List[str],
        output_size: Union[int, Tuple[int], List[int]],
        sampling_ratio: int,
    ):
        super(MultiScaleRoIAlign, self).__init__()
        if isinstance(output_size, int):
            output_size = (output_size, output_size)
        self.featmap_names = featmap_names
        self.sampling_ratio = sampling_ratio
        self.output_size = tuple(output_size)
        self.scales = None
        self.map_levels = None

    def convert_to_roi_format(self, boxes: List[Tensor]) -> Tensor:
        concat_boxes = torch.cat(boxes, dim=0)
        device, dtype = concat_boxes.device, concat_boxes.dtype
        ids = torch.cat(
            [
                torch.full_like(b[:, :1], i, dtype=dtype, layout=torch.strided, device=device)
                for i, b in enumerate(boxes)
            ],
            dim=0,
        )
        rois = torch.cat([ids, concat_boxes], dim=1)
        return rois

    def infer_scale(self, feature: Tensor, original_size: List[int]) -> float:
        # assumption: the scale is of the form 2 ** (-k), with k integer
        size = feature.shape[-2:]
        possible_scales = torch.jit.annotate(List[float], [])
        for s1, s2 in zip(size, original_size):
            approx_scale = float(s1) / float(s2)
            scale = 2 ** float(torch.tensor(approx_scale).log2().round())
            possible_scales.append(scale)
        assert possible_scales[0] == possible_scales[1]
        return possible_scales[0]

    def setup_scales(
        self,
        features: List[Tensor],
        image_shapes: List[Tuple[int, int]],
    ) -> None:
        assert len(image_shapes) != 0
        max_x = 0
        max_y = 0
        for shape in image_shapes:
            max_x = max(shape[0], max_x)
            max_y = max(shape[1], max_y)
        original_input_shape = (max_x, max_y)

        scales = [self.infer_scale(feat, original_input_shape) for feat in features]
        # get the levels in the feature map by leveraging the fact that the network always
        # downsamples by a factor of 2 at each level.
        lvl_min = -torch.log2(torch.tensor(scales[0], dtype=torch.float32)).item()
        lvl_max = -torch.log2(torch.tensor(scales[-1], dtype=torch.float32)).item()
        self.scales = scales
        self.map_levels = initLevelMapper(int(lvl_min), int(lvl_max))

    def forward(
        self,
        x: Dict[str, Tensor],
        boxes: List[Tensor],
        image_shapes: List[Tuple[int, int]],
    ) -> Tensor:
        """
        Arguments:
            x (OrderedDict[Tensor]): feature maps for each level. They are assumed to have
                all the same number of channels, but they can have different sizes.
            boxes (List[Tensor[N, 4]]): boxes to be used to perform the pooling operation, in
                (x1, y1, x2, y2) format and in the image reference size, not the feature map
                reference.
            image_shapes (List[Tuple[height, width]]): the sizes of each image before they
                have been fed to a CNN to obtain feature maps. This allows us to infer the
                scale factor for each one of the levels to be pooled.
        Returns:
            result (Tensor)
        """
        x_filtered = []
        for k, v in x.items():
            if k in self.featmap_names:
                x_filtered.append(v)
        num_levels = len(x_filtered)
        rois = self.convert_to_roi_format(boxes)
        if self.scales is None:
            self.setup_scales(x_filtered, image_shapes)

        scales = self.scales
        assert scales is not None

        if num_levels == 1:
            return roi_align(
                x_filtered[0], rois,
                output_size=self.output_size,
                spatial_scale=scales[0],
                sampling_ratio=self.sampling_ratio
            )

        mapper = self.map_levels
        assert mapper is not None

        levels = mapper(boxes)

        num_rois = len(rois)
        num_channels = x_filtered[0].shape[1]

        dtype, device = x_filtered[0].dtype, x_filtered[0].device
        result = torch.zeros(
            (num_rois, num_channels,) + self.output_size,
            dtype=dtype,
            device=device,
        )

        tracing_results = []
        for level, (per_level_feature, scale) in enumerate(zip(x_filtered, scales)):
            idx_in_level = torch.where(levels == level)[0]
            rois_per_level = rois[idx_in_level]

            result_idx_in_level = roi_align(
                per_level_feature, rois_per_level,
                output_size=self.output_size,
                spatial_scale=scale, sampling_ratio=self.sampling_ratio)

            if torchvision._is_tracing():
                tracing_results.append(result_idx_in_level.to(dtype))
            else:
                # result and result_idx_in_level's dtypes are based on dtypes of different
                # elements in x_filtered.  x_filtered contains tensors output by different
                # layers.  When autocast is active, it may choose different dtypes for
                # different layers' outputs.  Therefore, we defensively match result's dtype
                # before copying elements from result_idx_in_level in the following op.
                # We need to cast manually (can't rely on autocast to cast for us) because
                # the op acts on result in-place, and autocast only affects out-of-place ops.
                result[idx_in_level] = result_idx_in_level.to(result.dtype)

        if torchvision._is_tracing():
            result = _onnx_merge_levels(levels, tracing_results)

        return result

实际上Faster R-CNN论文发表时并没有RoI Align技术，当时仍然沿用的是FastR-CNN中的RoI Pool。RoIPool指的是对RoI内的特征做池化，取得一个小的featuremap，即，把原来形状不定的$h \timesw$（$h,w$均为变量）的RoI窗口内的特征池化为统一的$H \times W$（$H, W$均为常量）的小feature map。

RoI Align其实是Mask R-CNN论文中提出的概念。RoI Align觉得RoIPool的处理太粗糙了，存在量化（Quantization）的问题，计算featuremap上的窗口坐标的时候就舍入取整了，窗口内划分bins的时候又舍入取整了，这样就很不精确。这样的量化处理，用作分类任务倒还影响不大，但是用作图像分割这种像素级精度的任务时就是个问题了。

RoI Align对RoIPool的改进及其二次插值的数学计算原理可以仔细阅读这篇文章：
UnderstandingRegion of Interest — (RoI Align and RoI Warp) | by Kemal Erdem(burnpiro) | Towards Data Science

MultiScaleRoIAlign核心的RoIAlign操作是通过调用torchvision.ops.roi_align包的roi_align函数实现的，而该函数实际上也只是执行了对底层torch.ops.torchvision.roi_align函数的调用。

在本例中，输入的两张图片经过RPN处理后，各得到1000个boxes，即共2000个boxes。经过RoIPool/ RoIAlign处理后，输出为：

box_features（results）：shape[2000, 256,7, 7]

表示2000个boxes，都被池化为了$C, H, W =256, 7, 7$的特征。

TwoMLPHead

torchvision采用torchvision.models.detection.faster_rcnn包中的TwoMLPHead作为MLPHead的实现。

class TwoMLPHead(nn.Module):
    """
    Standard heads for FPN-based models

    Arguments:
        in_channels (int): number of input channels
        representation_size (int): size of the intermediate representation
    """

    def __init__(self, in_channels, representation_size):
        super(TwoMLPHead, self).__init__()

        self.fc6 = nn.Linear(in_channels, representation_size)
        self.fc7 = nn.Linear(representation_size, representation_size)

    def forward(self, x):
        x = x.flatten(start_dim=1)

        x = F.relu(self.fc6(x))
        x = F.relu(self.fc7(x))

        return x

这部分并不复杂，实际上就是实现了两层的MLP，名为fc6和fc7。

在本例中，RoIAlign输出的box_features原shape[2000, 256,7, 7]，在TwoMLPHead中：

首先经过flatten处理，变为shape[2000, 12544]；
进过双层MLP处理后，变为shape[2000, 1024]。

返回的是如上非线性转换后的box_features特征，此时shape[2000,1024]。

FastRCNNPredictor

torchvision采用torchvision.models.detection.faster_rcnn包中的FastRCNNPredictor作为Predictor的实现。

class FastRCNNPredictor(nn.Module):
    """
    Standard classification + bounding box regression layers
    for Fast R-CNN.

    Arguments:
        in_channels (int): number of input channels
        num_classes (int): number of output classes (including background)
    """

    def __init__(self, in_channels, num_classes):
        super(FastRCNNPredictor, self).__init__()
        self.cls_score = nn.Linear(in_channels, num_classes)
        self.bbox_pred = nn.Linear(in_channels, num_classes * 4)

    def forward(self, x):
        if x.dim() == 4:
            assert list(x.shape[2:]) == [1, 1]
        x = x.flatten(start_dim=1)
        scores = self.cls_score(x)
        bbox_deltas = self.bbox_pred(x)

        return scores, bbox_deltas

这部分也并不复杂，实际上就是同论文中描述的一样，通过MLP实现了两个预测分支：

self.cls_score分支预测目标的分类分值scores；
self.bbox_pred分支回归目标对应各个分类的目标框回归值。

在torchvision的预训练模型中，FastRCNNPredictor的num_classes是91，即能识别含背景在内的91个类。

在本例中，两个分支根据RoIAlign和TwoMLPHead提取出的特征，分别预测输出：

class_logits（socres）：shape[2000,91]；
box_regression（bbox_deltas）：shape[2000,364]。

意思是输入的2张图片上共2000个框（1000个/图片），这2000个框都做了分类预测，并且为每个类分别计算了目标框的回归修正值。

postprocess_detections

在模型的Predictor完成预测后，还需要做后续的一些处理，该部分的处理在torchvision.models.detection.roi_heads.RoIHeads的postprocess_detections函数中实现。

class RoIHeads(torch.nn.Module):
    
    # ...
    
def postprocess_detections(self,
                               class_logits,    # type: Tensor
                               box_regression,  # type: Tensor
                               proposals,       # type: List[Tensor]
                               image_shapes     # type: List[Tuple[int, int]]
                               ):
        # type: (...) -> Tuple[List[Tensor], List[Tensor], List[Tensor]]
        device = class_logits.device
        num_classes = class_logits.shape[-1]

        boxes_per_image = [boxes_in_image.shape[0] for boxes_in_image in proposals]
        pred_boxes = self.box_coder.decode(box_regression, proposals)

        pred_scores = F.softmax(class_logits, -1)

        pred_boxes_list = pred_boxes.split(boxes_per_image, 0)
        pred_scores_list = pred_scores.split(boxes_per_image, 0)

        all_boxes = []
        all_scores = []
        all_labels = []
        for boxes, scores, image_shape in zip(pred_boxes_list, pred_scores_list, image_shapes):
            boxes = box_ops.clip_boxes_to_image(boxes, image_shape)

            # create labels for each prediction
            labels = torch.arange(num_classes, device=device)
            labels = labels.view(1, -1).expand_as(scores)

            # remove predictions with the background label
            boxes = boxes[:, 1:]
            scores = scores[:, 1:]
            labels = labels[:, 1:]

            # batch everything, by making every class prediction be a separate instance
            boxes = boxes.reshape(-1, 4)
            scores = scores.reshape(-1)
            labels = labels.reshape(-1)

            # remove low scoring boxes
            inds = torch.where(scores > self.score_thresh)[0]
            boxes, scores, labels = boxes[inds], scores[inds], labels[inds]

            # remove empty boxes
            keep = box_ops.remove_small_boxes(boxes, min_size=1e-2)
            boxes, scores, labels = boxes[keep], scores[keep], labels[keep]

            # non-maximum suppression, independently done per class
            keep = box_ops.batched_nms(boxes, scores, labels, self.nms_thresh)
            # keep only topk scoring predictions
            keep = keep[:self.detections_per_img]
            boxes, scores, labels = boxes[keep], scores[keep], labels[keep]

            all_boxes.append(boxes)
            all_scores.append(scores)
            all_labels.append(labels)

        return all_boxes, all_scores, all_labels

后处理在for循环中，遍历每一张图片：

去除背景类的框；
去除低分框；
去除空框（尺寸极小的无意义小框）；
对它的boxes和scores做（类内）NMS；
保留self.detections_per_img个的top-k个目标。

因为本例输入的是随机值填充的模拟图片，所以在去除低分框的环节，2000个候选框就因为没有实际的目标而被全部滤除了。

GeneralizedRCNNTransform.postprocess

FasterR-CNN模型的后期处理由torchvision.models.detection.transform包的GeneralizedRCNNTransform类实现。

class GeneralizedRCNNTransform(nn.Module):
    
    # ...
    
def postprocess(self,
                    result,               # type: List[Dict[str, Tensor]]
                    image_shapes,         # type: List[Tuple[int, int]]
                    original_image_sizes  # type: List[Tuple[int, int]]
                    ):
        # type: (...) -> List[Dict[str, Tensor]]
        if self.training:
            return result
        for i, (pred, im_s, o_im_s) in enumerate(zip(result, image_shapes, original_image_sizes)):
            boxes = pred["boxes"]
            boxes = resize_boxes(boxes, im_s, o_im_s)
            result[i]["boxes"] = boxes
            if "masks" in pred:
                masks = pred["masks"]
                masks = paste_masks_in_image(masks, boxes, o_im_s)
                result[i]["masks"] = masks
            if "keypoints" in pred:
                keypoints = pred["keypoints"]
                keypoints = resize_keypoints(keypoints, im_s, o_im_s)
                result[i]["keypoints"] = keypoints
        return result

其实对于目标检测而言，实际上只对boxes的坐标做了resize的操作。因为GeneralizedRCNNTransform在对输入图像做预处理的时候，有进行尺寸转换，而且转tensor的时候又增加了padding是同一batch的图像张量能够保持尺寸一致。所以输出结果的时候，还是要把在tensor上的坐标转换为原始图像尺度上的坐标。

3 总结

最后总览一下整个模型的实现结构，只需通过简单的print：

1	print(model)

，即可输出结果：

FasterRCNN(
  (transform): GeneralizedRCNNTransform(
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
      Resize(min_size=(800,), max_size=1333, mode='bilinear')
  )
  (backbone): BackboneWithFPN(
    (body): IntermediateLayerGetter(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): FrozenBatchNorm2d(64)
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): FrozenBatchNorm2d(256)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256)
          (relu): ReLU(inplace=True)
        )
      )
      (layer2): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(128)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(128)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(512)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): FrozenBatchNorm2d(512)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(128)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(128)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(512)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(128)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(128)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(512)
          (relu): ReLU(inplace=True)
        )
        (3): Bottleneck(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(128)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(128)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(512)
          (relu): ReLU(inplace=True)
        )
      )
      (layer3): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): FrozenBatchNorm2d(1024)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024)
          (relu): ReLU(inplace=True)
        )
        (3): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024)
          (relu): ReLU(inplace=True)
        )
        (4): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024)
          (relu): ReLU(inplace=True)
        )
        (5): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(256)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(256)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(1024)
          (relu): ReLU(inplace=True)
        )
      )
      (layer4): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(512)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(512)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(2048)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): FrozenBatchNorm2d(2048)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(512)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(512)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(2048)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(512)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(512)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(2048)
          (relu): ReLU(inplace=True)
        )
      )
    )
    (fpn): FeaturePyramidNetwork(
      (inner_blocks): ModuleList(
        (0): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
        (1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
        (2): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
        (3): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
      )
      (layer_blocks): ModuleList(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
      (extra_blocks): LastLevelMaxPool()
    )
  )
  (rpn): RegionProposalNetwork(
    (anchor_generator): AnchorGenerator()
    (head): RPNHead(
      (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (cls_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))
      (bbox_pred): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))
    )
  )
  (roi_heads): RoIHeads(
    (box_roi_pool): MultiScaleRoIAlign()
    (box_head): TwoMLPHead(
      (fc6): Linear(in_features=12544, out_features=1024, bias=True)
      (fc7): Linear(in_features=1024, out_features=1024, bias=True)
    )
    (box_predictor): FastRCNNPredictor(
      (cls_score): Linear(in_features=1024, out_features=91, bias=True)
      (bbox_pred): Linear(in_features=1024, out_features=364, bias=True)
    )
  )
)

模型结构可以总结为层次结构：

Faster R-CNN
- (transform): GeneralizedRCNNTransform
- (backbone): BackboneWithFPN
  - (body): IntermediateLayerGetter
  - (fpn): FeaturePyramidNetwork
- (rpn): RegionProposalNetwork
  - (anchor_generator): AnchorGenerator
  - (head): RPNHead
- (roi_heads): RoIHeads
  - (box_roi_pool): MultiScaleRoIAlign
  - (box_head): TwoMLPHead
  - (box_predictor): FastRCNNPredictor

Go笔记

2020-11-19T15:14:05.000Z

抽空学习Go语言。

Go笔记

参考书籍：

朱荣鑫，黄迪璇，张天. Go语言高并发与微服务实战[M].北京：中国铁道出版社有限公司，2020.

代码同步更新在GitHub：HearyShen/LearnGo

1 基本变量

package main

import (
"fmt"
"math"
"unicode/utf8"
)

// run with command:
// go run src\1_variables.go

func main() {
// 3 ways to declare and initialize a variable
fmt.Printf("3 ways to declare and init a variable:\n")

var a int = 100
var b = "100"
c := 0.17

fmt.Printf("a value = %v (%T)\n", a, a)
fmt.Printf("b value = %v (%T)\n", b, b)
fmt.Printf("c value = %v (%T)\n", c, c)
fmt.Println()

// swap variables
fmt.Printf("Easy way to swap variables:\n")
v1 := 1
v2 := 2
fmt.Printf("before swap: v1 = %v (%T), v2 = %v (%T)\n", v1, v1, v2, v2)

v1, v2 = v2, v1
fmt.Printf("after swap: v1 = %v (%T), v2 = %v (%T)\n", v1, v1, v2, v2)
fmt.Println()

/*
Integer
signed: int8, int16, int32, int64
unsigned: uint8, uint16, uint32, uint64
*/
var vUint16 uint16 = math.MaxUint8 + 1
// vUint16 = math.MaxUint16 + 1// src\1_variables.go:37:9: constant 256 overflows uint16
fmt.Printf("vUint16 = %v (%T)\n", vUint16, vUint16)

var vUint8 uint8 = uint8(vUint16)
fmt.Printf("vUint8 = %v (%T)\n", vUint8, vUint8) // truncated: 0000 0001 (0000 0000)

/*
Float
float32, float64
*/
var vFloat32 float32 = math.E
var vFloat64 float64 = math.E
fmt.Printf("vFloat32 = %f (%T)\n", vFloat32, vFloat32)
fmt.Printf("vFloat64 = %.10f (%T)\n", vFloat64, vFloat64)

/*
Bool
true/false, can not cast to integer types
*/
var vBool bool = true
fmt.Printf("vBool = %v (%T)\n", vBool, vBool)

/*
String
*/
var vStr string = "你好, Go!"
fmt.Printf("vStr = \"%s\" (%T)\n", vStr, vStr)
fmt.Printf("byte len of vStr = %v\n", len(vStr))                    // 3*2 + 5*1 = 11, chinese character uses 3 bytes
fmt.Printf("rune len of vStr = %v\n", utf8.RuneCountInString(vStr)) // 7
// traverse each unicode character
for i, h := range vStr {
fmt.Printf("[%v:%c]", i, h)
}
fmt.Println()

/*
Pointer
*/
var ptrStr *string = &vStr
fmt.Printf("ptrStr = %v (%T)\n", ptrStr, ptrStr)
fmt.Printf("*ptrStr = %v (%T)\n", *ptrStr, *ptrStr)

/*
Struct
*/
var vStruct struct {
id     int
name   string
salary float32
}
vStruct.id = 1
vStruct.name = "Mike"
vStruct.salary = 123.45
fmt.Printf("vStruct = %v (%T)\n", vStruct, vStruct)
}

2 命令行参数flag

package main

import (
"flag"
"fmt"
)

// run with command:
// go run src\2_flag.go --username mike --password 123456 --id 1

func main() {
// argument name, default value, tips
var username *string = flag.String("username", "guest", "a string of username")

var password string
flag.StringVar(&password, "password", "123", "a string of password")

id := flag.Int("id", 0, "an integer of id")

flag.Parse()
fmt.Printf("username = %v, password = %v, id = %v\n", *username, password, *id)
}

3 常量const

package main

import "fmt"

// run with command:
// go run src\3_const.go

func main() {
const helloStr = "Hello, world!"
const (
name   = "Mike"
salary = 123.45
)
fmt.Printf("%v (%T)", name, name)

name = "Tom" // cannot assign to name
fmt.Printf("%v (%T)", name, name)

ptrName := &name // cannot take the address of name
*ptrName = "Tom"
fmt.Printf("%v (%T)", name, name)
}

4 类型type

package main

import "fmt"

// run with command:
// go run src\4_type.go

type aliasInt = int // declare an alias for int
type myInt int      // defines a new type

// Person is type of a struct
type BasicPerson struct {
name string
age  uint8
}

func main() {
var vAliasInt aliasInt
fmt.Printf("vAliasInt = %v (%T)\n", vAliasInt, vAliasInt)

var vMyInt myInt
fmt.Printf("vMyint = %v (%T)\n", vMyInt, vMyInt)

var person BasicPerson
person.name = "Mike"
person.age = 20
fmt.Printf("person = %v (%T)\n", person, person)
}

5 if-else条件

package main

import (
"flag"
"fmt"
)

// run with command:
// go run src\5_ifelse.go --score 100

func main() {
score := flag.Int("score", -1, "Score of a test")

flag.Parse()
fmt.Printf("score = %v (%T)\n", *score, *score)

if *score < 60 {
fmt.Println("Fail to pass")
} else if *score < 80 {
fmt.Println("Fine")
} else if *score <= 100 {
fmt.Println("Excellent")
} else {
fmt.Println("Wrong score")
}
}

6 switch-case条件

package main

import (
"flag"
"fmt"
)

// run with command:
// go run src\6_switchcase.go -score 100 -course CPP

func main() {
score := flag.Int("score", -1, "Score of a test")
course := flag.String("course", "CPP", "Course name")
flag.Parse()

fmt.Printf("score = %v (%T)\n", *score, *score)
fmt.Printf("course = %v (%T)\n", *course, *course)

switch {
case *score < 60:
fmt.Println("fail to pass")
break
case *score < 80:
fmt.Println("fine")
break
case *score <= 100:
fmt.Println("excellent")
break
default:
fmt.Println("wrong score")
}

switch *course {
case "CPP":
fmt.Println("C plus plus")
break
case "PY":
fmt.Println("Python")
break
default:
fmt.Println("Unknown")
}
}

7 for-loop循环

package main

import "fmt"

// run with commands:
// go run src\7_forloop.go

// Golang only has for loop
// Golang does not support while and do-while loop

func main() {
for i := 0; i < 10; i++ {
fmt.Println(i)
}
}

8 数组array

package main

import "fmt"

func main() {
// init array
var colors [3]string
colors[0] = "Red"
colors[1] = "Green"
colors[2] = "Blue"
fmt.Println(colors)

// init array with initial values
languages := [...]string{"C", "C++", "Java"}
fmt.Println(languages)

// init array with new(type), returns a pointer
nations := new([3]string)
nations[0] = "China"
nations[1] = "India"
nations[2] = "Japan"
fmt.Println(*nations)
}

9 切片slice

package main

import "fmt"

// run with commands:
// go run src\9_slice.go

// slice is a variable length sequence of data

func main() {
// [...]int{} marks a fixed length array,
// while []int{} declares a variable length slice
slice := []int{1, 2, 3, 4, 5, 6}
printSlice("slice", slice)
printSlice("subSlice", slice[3:])

// slice can be appended with one or more elements
// If it has sufficient capacity, the destination is resliced to accommodate the new elements.
// If it does not, a new underlying array will be allocated.
slice = append(slice, 7, 8)
printSlice("slice", slice)
printSlice("subSlice", slice[3:])

// slice can also be created from an array
arr := [...]int{11, 22, 33, 44, 55}
arrSlice := arr[:3]
printSlice("arrSlice", arrSlice)

// slice can also be dynamically created with make([]T, size, cap)
madeSlice := make([]int, 8, 16)
printSlice("makeSlice", madeSlice)

// slice can be copied from src slice to dest slice
copy(madeSlice, slice)
printSlice("copiedSlice", madeSlice)
}

func printSlice(tag string, slice []int) {
fmt.Printf("%s: date = %v, len = %d, cap = %d, addr = %p\n", tag, slice, len(slice), cap(slice), &slice)
}

10 列表list

package main

import (
"container/list"
"fmt"
)

// run with commands:
// go run src\10_list.go

// list.List in go is a doubly linked list

func main() {
// create a list
numsList := list.New()

// append elements to list with PushBack
for i := 1; i < 10; i++ {
numsList.PushBack(i)
}
printList(numsList)

// add elements to front with PushFront
first := numsList.PushFront(0)
printList(numsList)

// elements can be removed
numsList.Remove(first)
printList(numsList)
}

func printList(srcList *list.List) {
for node := srcList.Front(); node != nil; node = node.Next() {
fmt.Print(node.Value, " ")
}
fmt.Println()
}

11 字典map

package main

import "fmt"

// run with commands:
// go run src\11_map.go

func main() {
// create a map
emptyMap := map[string]string{}
printMap(emptyMap)

// create and init a map
city2id := map[string]string{
"Suzhou":   "0512",
"Beijing":  "010",
"Shanghai": "021",
}
printMap(city2id)

// create a map with make(type)
id2city := make(map[string]string)

id2city["0512"] = "Suzhou"
id2city["010"] = "Beijing"
id2city["021"] = "Shanghai"

fmt.Printf("Query: 010, Result: %s\n", id2city["010"])
printMap(id2city)
}

func printMap(srcMap map[string]string) {
fmt.Printf("len = %d\n", len(srcMap))
for k, v := range srcMap {
fmt.Println(k, v, " ")
}
fmt.Println()
}

12 函数func

package main

import (
"fmt"
"time"
)

// run with commands:
// go run src\12_func.go

/*
A standard paradigm of function in go:

func func_name(inputParams) (returnParams) {
func body
}
*/

// a func with multiple inputs and single output
func add(x, y int) int {
return x + y
}

// a func with multiple inputs and multiple outputs
// return values can be named
func div(dividend, divisor int) (quotient, remainder int) {
quotient = dividend / divisor
remainder = dividend % divisor
return
}

func addMul(x, y int) (int, int) {
vAdd := x + y
vMul := x * y
return vAdd, vMul
}

// a function with input but no return value
func echo(s string) {
fmt.Println(s)
}

// input a func as a callback func
func traverse(arr []int, handler func(num int)) {
for _, v := range arr {
handler(v)
}
}

// pass a pointer to func
func increase(x *int) {
*x = 2
}

func main() {
// call a named function
fmt.Println(add(1, 2))
fmt.Println(div(8, 5))
fmt.Println(addMul(2, 3))
echo("Hello, world!")

// define and call an anonymous func
vMul := func(x, y int) int {
return x - y
}(1, 2)
fmt.Println(vMul)

curTime := func() {
fmt.Println(time.Now())
}
curTime()

// use an anonymous func as callback function
arr := []int{1, 2, 3, 4, 5}
traverse(arr, func(num int) {
fmt.Print(num*num, " ")
})
fmt.Println()

// pass num to func by its pointer
vNum := 1
fmt.Println("Before: ", vNum)
increase(&vNum)
fmt.Println("After: ", vNum)
}

13 闭包closure

package main

import "fmt"

// run with commands:
// go run src\13_closure.go

// closure is a function carrying state

func createCounter(initial int) func() int {
if initial < 0 {
initial = 0
}

return func() int {
initial++
return initial
}
}

func main() {
counter1 := createCounter(0)
fmt.Println(counter1()) // 1
fmt.Println(counter1()) // 2

counter2 := createCounter(10)
fmt.Println(counter2()) // 11

fmt.Println(counter1()) // 3
}

14 结构体struct

package main

import "fmt"

// run with commands:
// go run src\14_struct.go

/*
struct can be defined in paradigm:

type structName struct {
value1 valueType1
value2 valueType2
...
}
*/

type Person struct {
Name  string
Birth string
ID    uint64
}

func main() {
// declare a struct variable
var person1 Person
person1.Name = "Mike"
person1.Birth = "1990-1-2"
person1.ID = 1
fmt.Println(person1)

// new a struct variable
person2 := new(Person) // person2 is a pointer
person2.Name = "Tom"
person2.Birth = "1991-2-3"
person2.ID = 2
fmt.Println(person2)

// another way to new a person with empty initial values
person3 := &Person{} // person3 is a pointer
person3.Name = "Nancy"
person3.Birth = "1992-3-4"
person3.ID = 3
fmt.Println(person3)

// create an object with initial values
person4 := Person{
Name:  "Jack",
Birth: "1993-4-5",
ID:    4,
}
fmt.Println(person4)

person5 := &Person{ // person5 is a pointer
Name:  "John",
Birth: "1994-5-6",
ID:    5,
}
fmt.Println(person5)
}

15 方法method

package main

import "fmt"

// run with commands:
// go run src\15_method.go

/*
In go, method is a function with recipient.
Recipient can be any type, typically a struct, which means any type in
go can have its methods.

Method can be defined in paradigm:

func (recipient RecipientType) methodName(inputParams) (returnParams) {
func body
}
*/

type Student struct {
Name string
Age  uint8
ID   uint64
}

// modify student's name with pointer to the instance
func (student *Student) setName(name string) {
student.Name = name
}

// non-pointer, unable to modify the original instance
func (student Student) badSet(name string) {
student.Name = name
}

func (student Student) print() {
fmt.Printf("Student %s (ID: %v) is %v years old.\n",
student.Name, student.ID, student.Age)
}

func main() {
student1 := Student{
Name: "Jack",
Age:  12,
ID:   1,
}

student1.print()

student1.badSet("Little Jack")
student1.print()

student1.setName("Big Jack")
student1.print()
}

16 接口interface

package main

import "fmt"

// run with commands:
// go run src\16_interface1.go

/*
standard interface paradigm:

type interfaceName interface {
func1(inputParams) (returnParams)
func2(inputParams) (returnParams)
func3(inputParams) (returnParams)
}

If the interfaceName is in uppercase, its a public interface.
If the function name is in uppercase, its a public function.
A public function can be accessed outside of the package,
otherwise, non-public function can only be accessed inside of
the package.
*/

type Cat interface {
CatchMouse()
}

type Dog interface {
Bark()
}

type CatDog struct {
Name string
}

// type CatDog implements functions in interface Cat
func (catDog *CatDog) CatchMouse() {
fmt.Printf("%s is catching mice!\n", catDog.Name)
}

// type CatDog implements functions in interface Dog
func (catDog *CatDog) Bark() {
fmt.Printf("%s is barking!\n", catDog.Name)
}

func main() {
// catDog is a pointer to CatDog instance
catDog := &CatDog{
Name: "Tom",
}

// declare Cat interface and point to CatDog type
var cat Cat
cat = catDog
cat.CatchMouse()

// declare Dog interface and point to CatDog type
var dog Dog
dog = catDog
dog.Bark()
}

另

package main

import "fmt"

// run with commands:
// go run src\16_interface2.go

/*
standard interface paradigm:

type interfaceName interface {
func1(inputParams) (returnParams)
func2(inputParams) (returnParams)
func3(inputParams) (returnParams)
}

If the interfaceName is in uppercase, its a public interface.
If the function name is in uppercase, its a public function.
A public function can be accessed outside of the package,
otherwise, non-public function can only be accessed inside of
the package.
*/

type Printer interface {
Print(interface{})
}

type FuncCaller func(p interface{})

func (funcCaller FuncCaller) Print(p interface{}) {
funcCaller(p)
}

func main() {
// Printer is the abstraction of printer
// FuncCaller func is the implementation of printer
// printer can call Printer.Print implemented by FuncCaller's Print
var printer Printer
printer = FuncCaller(func(p interface{}) {
fmt.Println(p)
})
// cast an anonymous function to FuncCaller type
// then printer calls Print implemented by FuncCaller
printer.Print("Golang is Good!")
}

17 嵌入embedding

package main

import "fmt"

// run with commands:
// go run src\16_interface2.go

/*
struct can embed anonymous attributes (type-only) to implement composition relation.

standard embedded struct type paradigm:

type A struct {
typeB
typeC
}
*/

type Swimming struct {
}

func (swim *Swimming) swim() {
fmt.Println("swimming")
}

type Flying struct {
}

func (flying *Flying) fly() {
fmt.Println("flying")
}

// Wild Duck can swim and fly
type WildDuck struct {
Swimming
Flying
}

// Domestic Duck can only swim
type DomesticDuck struct {
Swimming
}

func main() {
wildDuck := WildDuck{}
wildDuck.fly()
wildDuck.swim()

domesticDuck := DomesticDuck{}
domesticDuck.swim()
}

Linux环境下重装NVIDIA驱动报错kernel module (nvidia_modeset) in use问题分析

2020-10-08T14:59:47.000Z

Linux环境下重装NVIDIA驱动时，遭遇报错kernel module (nvidia_modeset)inuse。本文排查问题原因，并由此给出了无需重启系统也可正常重装的解决方案。

Linux环境下重装NVIDIA驱动报错kernelmodule (nvidia_modeset) in use问题分析

1 问题描述

在Linux环境下重装NVIDIA驱动时，出现报错，原因是内核模块正在使用中kernel module (nvidia_modeset) in use，导致无法安装新驱动。

NVIDIA驱动安装的报错页面给出的解决方案是重启一下（reboot）即可。但如果是服务器环境下，有其他用户的计算任务在执行，不希望打断，能否避免重启呢？

不知道原因的情况下，直接使用rmmod nvidia_modeset卸载该内核模块时，会遭遇报错，因为正在被占用而导致无法卸载。而rmmod -f的强制卸载又存在风险，可能造成系统崩溃（systemcrash）。

2 原因分析

照理说，老驱动已经卸载，那么不应该存在驱动相关的内核模块仍被使用的情况。

根据提示，既然是内核模块被占用的问题，那首先通过lsmod检查内核模块的使用情况，可以查到类似的引用关系：

1
2
3

Module                  Size  Used by
nvidia_modeset       1183744  1
nvidia              19722240  1 nvidia_modeset

从中可以发现，内核模块nvidia_modeset依赖于内核模块nvidia。

通过进一步检查nvidia相关进程ps -aux | grep nvidia，发现实际上是nvidia的persistencemode的守护进程占用了内核模块nvidia_modeset。而之所以有这样一个守护进程，是为了避免nvidia-smi每次唤起过慢的问题，即，通过设置sudo nvidia-persistenced --persistence-mode启用persistencemode，借助守护进程来维护记录GPU的状态，避免每次nvidia-smi都需要同步检查每一个GPU状态在阻塞等待上耗费太多时间。

3 解决方案

查出了原因，再想办法解决就容易了。

首先，通过ps -aux | grep nvidia找出使用nvidia_modeset的进程。

随后，通过sudo kill [pid]结束该persistencemode的守护进程。

通过ps进行验证，等待进程结束后，再检查lsmod就可以发现nvidia_modeset不再被占用了。

此时，通过rmmod卸载残余的nvidia内核模块，就不会再有报错了。

如此清理完内核模块后，重新执行NVIDIA驱动安装程序，即一切正常了。

Redis笔记

2020-08-19T12:29:13.000Z

Redis作为一款高效的键值型内存数据库，值得学习。本文梳理学习过程中的笔记。

Redis笔记

钱文品. Redis深度历险：核心原理与应用实践[M]. 北京: 电子工业出版社,2019.

1 概述

1.1 了解

Redis主要可以用作：

缓存：记录点赞数、缓存热帖、用户行为、榜单……；
分布式锁

1.2 安装

# ubuntu
sudo apt install redis

# centos
sudo yum install redis

1.3 运行

1 2	# redis command-line interface redis-cli

2 数据结构

Redis提供5种基础数据结构，分别为：字符串string、列表list、字典hash、集合set、有序集合zset。

2.1 字符串string

Redis所有数据结构都以唯一的key字符串作为名称，以此获得相应的value数据。

Redisstring内部数据结构类似Java的ArrayList，预分配冗余空间以免频繁分配内存。当字符串小于1MB时，扩容方法为加倍当前容量；当超过1MB时，每次扩充1MB空间。字符串最大长度为512MB。

2.2 列表list

Redislist类似Java中的LinkedList链表（实际上不完全是），双向链表，插入和删除时间复杂度O(1)，查询时间复杂度O(N)。

当list删除最后一个元素时，该数据结构被自动删除，内存回收。

双向链表可以被用来实现队列、栈。

2.2.1底层实现——ziplist与quicklist

Redis list底层实现是quicklist数据结构。

当list元素较少时，采用ziplist（压缩列表）。ziplist用连续内存将所有的元素连续存储。

当list元素较多时，采用quicklist（快速链表）。quicklist是将链表与ziplist结合的产物，每一个ziplist包含多个元素，却仅需两个前后指针，因此，quicklist避免了为每个元素配备prev/next双指针的空间消耗。quicklist既满足了快速的插入和删除，又避免了产生较大的空间冗余。

2.3 字典hash

Redis hash类似Java中的HashMap，无序字典，存储键值对。

hash采用数组+链表的数据结构，但hash的值只能是字符串。

当hash删除最后一个元素时，该数据结构被自动删除，内存回收。

2.3.1 渐进式rehash

Java的HashMap每次rehash需要一次性全部rehash，而Redis的hash在rehash时，为了避免阻塞服务，采用渐进式rehash。

渐进式rehash在rehash时，保留新旧两个hash结构。旧的hashtable仍可用作查询，同时将旧的hashtable持续rehash到新的hashtable上。等rehash全部完成后，才以新的hashtable取代旧的hashtable。

2.4 集合set

Redis set相当于Java中的HashSet，内部的键值对时无序的、唯一的。

set的底层实现相当于是hash，只不过hash的value村的都是NULL。

当set删除最后一个元素时，该数据结构被自动删除，内存回收。

2.5 有序集合zset

Redis zset类似Java的SortedSet和HashMap的结合体。

zset一方面是set，value元素是唯一的，另一方面其有序性是依靠为value赋予score作为排序权重实现的。

当zset删除最后一个元素时，该数据结构被自动删除，内存回收。

2.5.1 跳跃列表skiplist

zset内部的排序功能采用skiplist实现。

skiplist中，高层（level）链表跨度大，连接比较大的跨度范围。越往底层跨度越小，表示比较小的跨度范围。通过skiplist，可以从大范围缩小到小范围，快速定位插入与查询的位置。

2.6 小结

2.6.1 通用性质

list, set, hash, zset这四种容器数据结构具有两个通用性质：

create if not exists：操作时，如容器不存在，则新建；
drop if not elements：操作时，如容器为空，则删除。

2.6.2 过期时间

所有数据结构都可设置过期时间，过期则删除。

字符串设置过期时间后，如果字符串被修改，则过期时间失效。

3 应用

3.1 分布式锁

setnx (set if not exists)指令做锁标记，del删除锁标记。

1
2
3

> setnx lock:resource_a true
> expire lock:resource_a 5
> del lock:resource_a

事务需要判单自己能够设置对争用资源的分布式锁，才能修改资源。
expire设置5秒过期时间，防止死锁。

3.2 延时队列

list可以作为异步消息队列。

rpush/lpush操作入队列，lpop/rpop操作出队列。

blpop/brpop可以阻塞式（blocking）地读取数据。

3.3 位图

get/set处理整个位图的内容。

getbit/setbit处理各个位。

bitcount统计范围内1的位数。

bitpos查询第一个0或1的位置。

bitfield，包含get/set/incrby子指令，可以读取、设置和自增指定范围的位。bitfield可以混合多个子指令执行。

3.4 HyperLogLog

统计PV量无需去重，incrby自增就可以。统计UV则需要去重，不是简单的自增，去重常用的set集合在数据量很大时会消耗巨大的内存空间。

HyperLogLog可以实现去重计数问题。

pfadd添加元素（增加对该元素的计数）；

pfcount统计元素的计数。

pfmerge用于合并多个pf计数元素为同一个元素，合并pf计数值。

pf指的是HyperLogLog的发明人Philippe Flajolet教授。

HyperLogLog数据结构在计数较小时采用稀疏矩阵存储，在计数超过阈值时，转变为稠密矩阵。

HyperLogLog占据12KB存储空间，在数据量很大时，比set小了太多。

HyperLogLog的原理是调整低位连续零位的最大长度K，若K越大，概率越低，则说明计数N越大，由此通过有限的连续零位K来估算计数N，K与N存在线性相关性。占用12KB则是因为Redis的HyperLogLog实现采用$2^{14}=16384$个桶，每个桶maxbit为6bit，因此$2^{14} \times 6bit \div 8bit/byte =12KB$。

3.5 布隆过滤器（Bloom Filter）

RedisBloom
RedisBloom: Probabilistic Data Structures forRedis
The RedisBloom module provides four data structures: a scalableBloom filter, a cuckoo filter, acount-min sketch, and a top-k. Thesedata structures trade perfect accuracy for extreme memory efficiency, sothey're especially useful for big data and streaming applications.

bf.add添加元素；

bf.exists检查元素是否存在。

bf.madd添加多个元素；

bf.mexists检查多个元素是否存在。（返回分别表示每个元素存在性的0/1）

bf.reverse在添加元素之前预设布隆过滤器的key,error_rate和initial_size。

布隆过滤器：

添加时计算元素k个哈希，将对应的k个bit置为1；
检查存在时计算元素k个哈希，检查对应的k个bits是否都为1，如果是，则元素存在，否则不存在。

对hash函数数量k，布隆过滤器bit数量m，预计元素数量n，错误率f，有公式：\[k = \ln2 \times (m/n) = 0.7 \times (m / n) \\f = 2^{-k} = 0.6185^{m/n}\]

此时错误率最低。

3.6 简单限流

以zset的score范围来划定滑动窗口。score存储timestamp，这样就可以计算得出时间窗口内的元素数量，判断访问计数是否超限。

zset不适合数量很大的限流，例如：60秒内限流100万次，100万个元素的zset会占用过大的空间。

3.7 漏斗限流

redis-cell
A Redis module that provides rate limiting in Redis as a singlecommand. Implements the fairly sophisticated genericcell rate algorithm (GCRA) which provides a rolling time window anddoesn't depend on a background drip process.

漏斗（funnel）容量有限，不满时可以装入液体，漏斗满时无法装入液体，需要等漏斗内的液体慢慢流走一部分，才能继续装入。

CL.THROTTLE user123 15 30 60 1
               ▲     ▲  ▲  ▲ ▲
               |     |  |  | └───── apply 1 token (default if omitted)
               |     |  └──┴─────── 30 tokens / 60 seconds
               |     └───────────── 15 max_burst
               └─────────────────── key "user123"

3.8 GeoHash

通过GeoHash功能，可以快速找出指定经纬度周围的元素。

GeoHash将二维平面处理成网格，然后不断地行、列二分，对二维坐标进行编码，映射为一维整数。

Redis中，GeoHash将经纬度编码为52位整数，存入zset中，score是经纬度编码整数（zset的浮点数score可以无损存储52位整数），value是元素值。在zset中，借助skiplist来找出元素附近范围的其他元素是很容易的事情。使用坐标时，将编码整数解码还原为坐标即可。

geoadd添加经纬度坐标；

geodist计算元素之间的距离；

geopos读取元素的坐标；

geohash读取元素的经纬度编码字符串（base32编码的坐标值）。

georadiusbymember查询指定元素附近的其他元素。

注意：集群中，单个key下存储的坐标数量不宜过多（超过1MB），避免集群迁移出现卡顿。或者干脆采用独立实例，不做集群。

3.9 scan

keys列出符合pattern的key，采用遍历算法，时间复杂度O(N)。

scan从指定cursor开始，匹配pattern，扫描count个槽位。相较于keys，scan可以避免每次遍历整个redis内存槽。

Redis本身就相当于是一个很大的HashMap。scan的遍历顺序采用高位进位加法，以此避免字典扩容和缩容时重复或遗漏遍历槽位。

zscan遍历zset元素；

sscan遍历set元素；

hscan遍历hash元素。

3.10 避免bigkey

在业务开发中，避免大key的产生。

大key数据不论是在集群迁移时，还是在容器需要扩容时，哪怕是在回收时（因较大内存空间的分配和回收），都容易造成卡顿。

可以采用--bigkeys选项来检索大key。

1	redis-cli --bigkeys

4 原理

4.1 I/O模型

Redis是单线程程序。

Redis通过非阻塞I/O多路复用技术来提高单线程I/O处理效率。

对于每一个客户端socket连接，Redis为其关联：

一个指令队列，用于从客户端socket连接中读取指令。指令队列中的指令遵循FCFS；
一个响应队列，用于向客户端socket连接中写入指令。如果响应队列为空，说明暂无响应数据，则将该响应队列移出多路复用的write_fds以节省select代价。

对于定时任务，Redis采用最小堆进行管理：

最临近的任务放在堆顶；
取堆顶任务的距离时间作为select操作的timeout，这样在这段时间内就可以放心地select，不必担心错过定时任务。

4.2 通信协议

RESP (Redis SerializationProtocol)是Redis采用地通信协议，这是一种文本协议，实现简单，解析性能好。

RESP把数据分为5种最小单元类型，制定规则：

单行字符串，以+开头；
多行字符串，以$开头；
整数，以:开头；
错误消息，以-开头；
数组，以*开头。

4.3 持久化

4.3.1 快照

Redis采用fork机制创建子进程来导出快照。

内存空间采用COW机制，因此，父进程照常处理事务，修改的数据会记录在新的空间中，而子进程看到的仍然是fork时的内存数据，不用担心导出时数据又被更新的情况。

4.3.2 AOF日志

AOF日志记录Redis实例创建以来所有的修改性指令序列。

Redis收到客户端修改指令后，进行检查和处理，如果指令执行成功，则立即将该指令文本存储到AOF日志中。

AOF重写：长时间修改会积累大量的AOF日志，Redis可以开辟一个子进程遍历生成新的AOF指令日志，替代旧的AOF日志，起到日志瘦身的效果。（对同一个key频繁修改，会产生大量AOF日志，但实际上存一项就可以了。）

fsync：Redis定期调用fsync确保AOF日志实实在在写入磁盘，避免突然断电造成内存缓冲数据丢失。

混合持久化：快照 +AOF日志（增量）。提高重启效率，避免重做全部的AOF日志操作。

4.4 管道

Redis客户端重排指令。将读指令连续归在一起，写指令连续归在一起。这样客户端只需要向操作系统网络写缓冲区写一次，读缓冲区读一次即可，服务器端同理。节省了网络读写的次数。

4.5 事务

Redis可以实现begin, commit和rollback的事务功能。

4.6 PubSub

PubSub, Publisher Subscriber.

消息多播，一个Publisher可以向多个Subscriber提供消息。

Subscriber需要先订阅若干个channel，随后，Publisher向channel中发布数据，Redis会将数据提供给订阅该channel的所有Subscriber。

但是，如果subscriber掉线了，过后再上线，就不会再收到掉线时错过的消息了。Redis宕机时，就相当于时没有任何subscriber的情况，会造成所有的消息都被直接丢失的情况。

Redis在5.0版本开始提供新的Stream数据结构，实现了持久化的消息队列。

4.7 节省空间

4.7.1 32bit vs 64bit

32bit编译的Redis比64bit编译的版本节省一半的指针内存消耗。如使用内存不超过4GB，采用32bit即可。

4.7.2基于ziplist的小对象压缩存储

相较于传统的链表，每个entry作为一个节点，都需要配备prev/next两个指针，ziplist则将多个entry以数组的形式存为一个节点，减少所需的指针空间。

每个ziplits节点存储：

zlbytes，4字节，ziplist占用的字节数；
zltail，4字节，最后一个entry的偏移地址；
zllen，2字节，ziplist的entry数量；
entry数组，存储若干entry；
zlend，1字节，幻数255标记结尾。

4.7.3 基于intset的紧凑整数数组

inset数据结构包含：

encoding，表示value的位宽；
length，表示元素的个数；
value数组，存储若干value。

若整数用uint16表示即可，intset就用uint16；需要升级到uint32或uint64时再动态升级。

4.7.4 内存回收机制

删除key时，内存不会立即全部回收释放交给操作系统，而是会预留部分内存给未来的使用需求。

4.7.5 内存分配算法

Redis有多种内存分配算法：

jemalloc，facebook；
tcmalloc，google。

Redis默认使用jemalloc，该库性能稍好。

通过info memory可以查到当前使用的内存分配库。

5 集群

多个Redis节点组成Redis集群。

5.1 Redis集群与CAP定理

5.1.1 CAP定理

CAP定理指的是分布式系统的一致性（Consistency）、可用性（Availability）和分区容忍性（Partitiontolerance）不能三者兼得，最多只能满足两项。

当网络异常时，分布式节点之间无法连接，形成网络分区现象，如果要容忍分区情况，此时有两种选择：

保证可用性：即允许对每个节点的读和写，这样一来，节点之间就会因为无法立即同步而出现数据不一致的问题，放弃了强一致性，即AP；
放弃可用性：只允许对每个节点的读，禁止写，这样一来，能保证节点之间的数据一致性，但用户无法更新数据，损失了可用性，即CP[；

也就是说，网络分区发生时，一致性和可用性无法两全。

5.1.2 最终一致性

Redis的主从节点之间异步同步，不能保证严格的强一致性，因此Redis的选择是放弃一致性，转而满足可用性和分区容忍性。

Redis提供的是最终一致性（Eventuallyconsistent），网络断开时，主从节点之间会出现不一致，但网络恢复后，会多策略地尽快同步，最终主从节点保持一致。

5.2 集群同步技术

5.2.1 主从同步与从从同步

主从同步（master-slavesync）：主节点与从节点之间同步，主节点把数据复制（replicate）到从节点。

从从同步（slave-slave sync）：从节点把数据复制到另一个从节点。

通过引入从从同步，可以降低主节点的同步负担。

5.2.2 增量同步

Redis同步指令流。

Redis主节点把写指令记录在本地的指令缓存（buffer）中，异步地将缓存中地指令同步到从节点，即增量同步。

指令缓存采用的是定长环形数组，因此，如果数组写满了，就会重新从头写入，也就覆盖掉了原有内容。如果网络分区发生时，有节点上产生大量写指令，为了避免指令缓存被覆盖导致写入记录丢失，不能只依赖指令缓存来保存未同步的指令。

5.2.3 快照同步

快照同步：执行bgsave操作，把内存中的数据全部快照存储到硬盘文件中。

增加从节点：增加新的从节点时，通过快照同步为从节点全量加载数据，随后再做增量同步。

快照同步死循环问题：当快照同步太慢，或者指令缓存太小时，就会出现快照同步还没结束，指令缓存就写满的情况。这样一来，指令就不得不直接写入，那快照就过期了，又得重新做一遍快照，而重新做快照可能又太慢，指令缓存又写满了……。为避免死循环，需要设置一个合适的指令缓存大小。

5.2.4 无盘复制

快照同步需要写入磁盘，有不小的文件IO代价。而且Redis执行AOF时需要做fsync，如果此时快照同步，就不得不延后fsync，这样AOF就延后了，指令执行就延后了。

为此，Redis2.8.18开始支持无盘复制，主节点可以通过socket通信直接把快照发给从节点，避免磁盘上的文件IO代价。

5.2.5 wait指令实现同步复制

Redis的复制本身时异步执行的，因此不具备强一致性。

通过wait指令，可以实现Redis的同步复制，保证系统的（在没有网络分区情况下的）强一致性。

wait可以有限等待，也可以无限等待N个从节点同步完成，再执行后续指令。

如果无限等待时，Redis出现网络分区，那么同步无法完成，就会一直阻塞，导致Redis失去可用性。

5.3 Sentinel：自动主从切换技术

RedisSentinel集群通常包含3~5个Sentinel节点，保证Sentinel的可用性。

Sentinel集群持续监控主节点和从节点的状态，一旦出现问题，就自动提升一个可用的从节点为主节点，取代故障的不可用的主节点。

Sentinel的具体工作流程：

Client首先向Sentinel请求主节点的地址；
Sentinel将最新的主节点地址返回给Client；
Client访问主节点。

5.4 Codis：中心化集群方案

Codis是Redis集群方案之一，在Codis基础之上，开发出了TiDB。

单个Redis节点如果存储太多数据，会使得快照文件rdb特别大，导致同步起来很耗时，而且全量恢复也变得很慢。

Codis通过把数据分散到众多Redis节点上，来避免每个节点的数据量过大。

Codis对key做哈希，映射到1024个槽位（slots），以此求模，取得数据应该映射到的节点序号。分配完成后，Codis节点会存储槽位与Redis节点的映射关系。

Codis的扩容：可以通过增加Redis节点来扩容集群的容量。

Codis通过mget指令可以从分散的节点上取数据并汇总给用户。

5.5 Cluster：去中心化集群方案

Redis Cluster是去中心化的集群方案，每个节点都是对等的。

Redis Cluster把数据分为16384个槽位（$2^{14}$），每个节点负责一部分的槽位。客户端根据key来确定槽位，进而确定目标节点。如果客户端向错误的节点发送请求，该节点会计算key对应的槽位，向客户端发送重定向的响应，告知客户端目标节点。

节点迁移：迁移的最小单位是槽位，流程是从源节点获取内容，然后存到目标节点，最后从源节点中删除内容。

容错：RedisCluster可以为每个主节点设置若干从节点，自动实现故障时从节点提升为主节点。

可能下线与确定下线：集群节点采用Gossip协议来广播自己的状态。一个节点发现某个节点失联，则进入可能下线（PFail,PossiblyFail）状态。集群中大多数节点都收到该节点失联的消息，则标记该节点为确定下线（Fail）状态。

6 扩展

6.1 Stream：Redis5.0的消息队列

Redis Stream是Redis5.0中退出的一款新的支持多播的可持久化消息队列，极大地借鉴了Kafka的设计。

RedisStream通过消息链表将所有加入的消息串起来，每个消息包含唯一ID（timestampInMillis-sequence）和消息内容（形如hash结构的键值对）。

消费组：每个Stream可以挂载多个消费组（ConsumerGroup），不同消费组互相独立，互不影响，每个消费组都有一个游标last_delivered_id在Stream数组上向前移动，表示当前已经消费到哪条消息了。

消费者：每个消费组中可以包含多个消费者（Consumer），消费者之间为竞争关系，任意一个消费者读取消息都会使消费组的游标last_delivered_id向前移动。

PEL：每个消费者有一个的PEL（Pending EntriesList），PEL是一个状态列表pending_ids，记录已经被客户端读取，但尚未收到ACK的消息ID。通过PEL可以确保客户端至少消费了消息1次，而不会在网络传输中途丢失了消息。客户端重连时，可以根据PEL重新获取一遍接收失败的消息。

分区：Redis没有原生支持分区，分区Stream可以通过在客户端设计哈希策略来实现。Kafka原生支持Partition也是通过客户端的HashStrategy来决定将不同的消息加入不同的分区的。

xgroup create：创建消费组，创建时需要指定从哪个消息ID开始消费。

xadd：加入消息；

xdel：删除消息，只设置标志位，不实际删除消息；

xrange：获取消息列表，自动过滤标记为删除的消息；

xlen：获取消息长度；

del：删除整个消息列表的所有消息。

6.2 info：状态诊断

info指令可查询：

server：服务器信息；
clients：客户端信息；
memory：运行内存统计数据；
persistence：持久化信息；
stats：通用统计数据；
replication：主从复制；
cpu：cpu使用情况；
cluster：集群信息；
keyspace：键值对统计信息。

查询方式如：

Redis内：

1	> info memory

Redis外：

1	redis-cli info memory

6.3 Redlock分布式锁

Sentinel集群中，主节点挂掉后，从节点取而代之，但主节点的分布式锁没有同步到从节点，新升任主节点的从节点中没有这个分布式锁，就会造成不安全性。

对多个对等的Redis实例，Redlock基于“大多数机制”，加锁时，向过半的节点发送set指令，过半的节点加锁成功，则本次加锁成功；解锁时，向所有节点发送del指令。因为Redlock需要向多个节点进行读写，考虑出错重试、时钟漂移等问题，相对单实例Redis的性能会下降一点。

6.4 过期策略

设置了expire时间的key放在一个独立的字典里。

Redis的过期策略既有定期扫描，也有惰性策略。

定期扫描，Redis默认每秒10次过期扫描，扫描算法为：

从过期字典中随机选出20个key；
删除这20个key中已经过期的key；
如果过期key的比例超过1/4，则重复步骤1.。

为避免循环过度造成线程卡死，默认设置扫描时间上限为25ms。这个25ms的依据是，1秒10次，每次25ms，总共最多占用250ms，即1/4的CPU时间。Redis实际上限制的是CPU时间，避免过期扫描耗费超过1/4的CPU时间。

如果大量key同时过期，Redis就会循环扫描字典，删除key，直到过期字典中的过期key比例变低。当过期的key数量很多的时候，扫描时间是完全可能撞到25ms的上限的。再加上内存回收的代价，就会产生比较多的CPU消耗。如果此时新来的请求设置的超时时间很短，例如10ms，就会导致刚设置数据，就开始扫描，等25ms扫描完，才来得及处理客户端的读取操作时，key早就过期了。客户端就发现自己刚设置的值，立即去修改就会超时过期，实际上是因为Redis的过期策略在间隔中消耗了时间。

为了避免以上问题，一方面，考虑到过期策略扫描耗时，过期时间不宜设置的过短；另一方面，避免大量key同时过期，哪怕对统一的过期时间加上一个随机量也好。

惰性策略：访问key时对key的过期时间进行检查，如果过期了就删除。

从节点不会主动执行过期策略，主节点删除节点并同步del给从节点，从节点收到后写入AOF，跟着主节点照做就是。不过因为同步是异步的，所以主从节点之间强一致性无法保证。

6.5 内存淘汰算法

Redis不允许发生swap，因为会造成性能急剧下降。

当Redis实际内存超过maxmemory时，有几种maxmemory-policy：

noeviction：可读不可写；
volatile-lru：淘汰过期集合中最少使用的（LRU）；
volatile-ttl：淘汰过期集合中剩余寿命TTL最小的key；
volatile-random：淘汰过期集合中随机key；
allkeys-lru：全体key中淘汰LRU；
allkeys-random：全体key中淘汰随机的key。

6.6 懒惰删除

del直接删除，通常非常快，但对象非常大时，删除操作会造成单线程卡顿。

Redis4.0引入的unlink可以解决卡顿问题，unlink卸下待删除对象，然后交给后台线程去异步地回收内存。

Redis4.0为flushdb和flushall都引入了异步化，加上async选项即可，如：flushall async。

异步删除借助异步队列实现，MainThread通过submitTask将待删除对象放入ConcurrentQueue，懒惰删除线程LazyFreeThread从中fetchTask并执行异步删除。

Redis的AOFSync需要将AOF日志同步到磁盘，需要调用sync函数，因为sync比较耗时，因此采用异步线程去调用，该异步线程也有自己的任务队列，存放AOFSync任务。

Redis在del和flush以外，也会在key过期、LRU淘汰、rename指令执行时回收内存。节点接受全量同步rdb文件后也会清空内存以载入数据。这些删除场景涉及额外的选项：

slave-lazy-flush：从节点接受rdb文件后的flush操作；
lazyfree-lazy-eviction：内存达到maxmemory时进行淘汰；
lazyfree-lazy-expire-key：过期删除；
lazyfree-lazy-server-del rename：指令删除destKey。

6.7 Jedis

Jedis是Java的Redis开源客户端。

因为Jedis对象不是线程安全的，所以使用Jedis是从Jedis连接池JedisPool中取出一个Jedis对象归该线程独占，用完了再还给连接池。

Jedis默认没有重试机制，网络抖动造成连接断开，再发送指令就会报错。需要手动捕获JedisConnectionException进行重连处理。

6.8 Redis安全

6.8.1 指令安全

rename-command既可以将已有命令更名，也可以更名为空字符串，从而屏蔽该命令被调用。

6.8.2 端口安全

bind指令规定监听的IP地址。

requirepass设置密码访问限制，从节点masterauth设置于主节点同步连接密码。

6.8.3 脚本安全

避免UGC的Lua脚本。

避免以root权限启动Redis。

6.8.4 SSL代理

使用SSH保护Redis连接。

使用官方推荐的spiped工具对SSH通道进行二次加密。spiped是一款加密代理软件。

初识分布式系统：CAP定理与BASE理论

2020-08-19T08:04:38.000Z

初步学习分布式系统，理解CAP定理与BASE理论。

初识分布式系统：CAP定理与BASE理论

传统单机事务模型难以应对分布式事务的处理需求，需要分布式系统。分布式系统的节点分布在网络中，难以像传统的集中式事务处理系统那样实现严格的ACID特性。

1 CAP定理

1.1 背景

2000年7月，加州大学伯克利分校Eric Brewer教授在ACM PODC (Principles ofDistributed Computing)会议上提出了CAP猜想。

2年后，麻省理工学院的Seth Gilbert和NancyLynch从理论上证明了CAP猜想的可行性，从此CAP定理成为分布式计算领域的公认定理。

1.2 定理

CAP定理：一个分布式系统不可能同时满足一致性（Consistency）、可用性（Availiability）与分区容错性（Partitiontorlence）这三个基本需求，最多智能同时满足其中两项。

1.2.1 一致性（Consistency）

一致性指的是多副本之间的一致性。分布式系统场景下，一个副本更新后，其他副本如果没有及时更新，那从其他副本上读取到的数据仍然是老数据，即，副本之间的数据出现不一致。

所有节点在同一时间具有相同的数据。

1.2.2 可用性（Availibity）

可用性指的是系统提供的服务必须一直处于可用状态，即，对用户请求总是在有限的时间内返回结果。

每个请求不关成功或是失败都有响应。

1.2.3 分区容错性（Partitiontorlence）

分区容错性指的是分布式系统遇到任何网络分区故障时，仍然能够对外提供满足一致性和可用性的服务，除非整个网络环境都发生了故障。

系统中任意信息的丢失或失败不影响系统的继续运作。

注：

分布式系统中，不同节点分布在不同的子网络，可能出现子网络之间网络断连，但子网络内部正常，使得分布式系统被分割为若干孤立区域。
组成一个分布式系统的每个节点的加入和退出，都可以看成时一个特殊的网络分区。

1.3 应用

根据CAP定理，分布式系统在应用中必须作出取舍，只能满足最多两个性质，意味着必须选择放弃一个性质。

放弃性质	说明	应用
CA：放弃分区容错性（-P）	单点集群系统，放弃分区容错性意味着放弃系统的可扩展性。实现分区容错性，简单的方法是将所有的数据（至少是事务相关的数据）放在一个分布式节点上，这样网络分区问题时，每个子网络都有依赖数据的可用副本。	RDBMS
CP：放弃可用性（-A）	一旦分布式系统遭遇网络分区或其他故障，受影响的服务需要等待一定时间才能恢复对外服务，在这段时间内不可用。满足一致性，分区容忍性的系统，通常性能不是特别高。	MongoDB, HBase, Redis
AP：放弃一致性（-C）	放弃分布式系统的强一致性，保证分布式系统的最终一致性。引入时间窗口的概念，隔一段时间在不同节点之间复制数据副本。	CouchDB, Cassandra, DynamoDB, Riak

具体地，

CA：放弃分区容错性
- RDBMS：关系型数据库管理系统（Relational Database ManagementSystem），不具备可扩展性。
CP：放弃可用性
- MongoDB：NoSQL，面向文档（document-oriented）；
- HBase：Hadoop Database，面向列（column-oriented）；
- Redis：Remote Dictionary Server，键值存储；
AP：放弃一致性
- CouchDB：面向文档；
- Cassandra：面向列；
- DynamoDB：面向文档；
- Riak：键值存储；

2 BASE理论

BASE名字取自缩写：

Basically Available
Soft state
Eventually consistent

2.1 背景

BASE理论由eBay架构师Dan Prichett在文章BASE: An AcidAlternative中首次提出，是对CAP中一致性和可用性权衡的结果。

2.2 理论

BASE理论的核心思想是：即使无法做到强一致性（Strongconsistency），但每个应用都可以根据自身的业务特点，采用适当的方式来使系统达到最终一致性（Eventualconsistency）。

牺牲强一致性来获得可用性。

2.2.1 基本可用（BasicallyAvailable）

分布式系统在出现不可预知故障时，允许损失部分可用性。

如：

响应时间上的损失：出现故障时，响应时间一定程度增加；
功能上的损失：购物节高峰时，部分用户被引导到一个降级页面。

2.2.2 软状态（Soft state）

允许系统中的数据存在中间状态，并认为该中间状态的存在不会影响系统的整体可用性，即允许系统在不同节点的数据副本之间进行数据同步的过程存在延时。

2.2.3 最终一致性（Eventuallyconsistent）

系统中所有的数据副本，经过一段时间同步后，最终能达到一个一致的状态。

不需要实时一致，达到一致所需的时间延迟，取决于网络延迟、系统负载和数据复制方案设计等因素。

实际工程实践中，最终一致性存在五类变种：

因果一致性（Causalconsistency）：进程A修改数据后通知进程B，进程B读取的数据应该是新值。
读己之所写（Read yourwrites）：进程A修改后再读取，得到的应该是新值。
会话一致性（Sessionconsistency）；系统保证再同一个有效的会话中实现读己之所写。
单调读一致性（Monotonic readconsistency）：进程读到新值后，后续不应该反而读出旧值。
单调写一致性（Monotonic writeconsistency）：同一个进程的写操作应该顺序执行。

Selector - 从JDK11源码理解Java I/O复用原理

2020-08-09T09:07:47.000Z

阅读JDK11源码实现的过程中，发现同为java.nio.channels.Selector，是Windows和Linux平台的Selector.open()所构造的Selector的底层实现完全不一样。

Selector -从JDK11源码理解Java I/O复用原理

Selector是JavaNIO中核心的多路复用选择器。线程可以将SocketChannel与选择键注册到Selector上，而Selector会选出I/O状态符合选择键条件的SocketChannel实例。

应用层

线程将SocketChannel实例与选择键注册到Selector上：

try {
    socketChannel.register(this.selector, SelectionKey.OP_READ);    // socketChannel is always Writable
    // socketChannel.register(this.selector, SelectionKey.OP_READ | SelectionKey.OP_WRITE);
    this.selector.wakeup();
} catch (ClosedChannelException e) {
    e.printStackTrace();
}

Selector可以取出I/O状态符合选择键的SocketChannel集合，遍历处理：

try {
    this.selector.select();

    Set selectionKeySet = this.selector.selectedKeys();
    Iterator selectionKeys = selectionKeySet.iterator();

    while (selectionKeys.hasNext()) {
        SelectionKey selectionKey = selectionKeys.next();

        if (selectionKey.isReadable()) {
            selectionKey.cancel();      // avoid repeating selecting the same channel
            SocketChannel socketChannel = (SocketChannel) selectionKey.channel();

            HttpWorker httpWorker = new HttpWorker(webRoot, socketChannel);
            this.executorService.submit(httpWorker);
        }

        selectionKeys.remove();
    }

} catch (IOException e) {
    e.printStackTrace();
}

抽象层

Selector

Selector.open()

外部通过Selector.open()方法就可以

1
2
3

import java.nio.channels.Selector;

Selector selector = Selector.open();

在Selector抽象类中实现为：

public abstract class Selector implements Closeable {

    /* more */
    
/**
     * Opens a selector.
     *
     *  The new selector is created by invoking the {@link
     * java.nio.channels.spi.SelectorProvider#openSelector openSelector} method
     * of the system-wide default {@link
     * java.nio.channels.spi.SelectorProvider} object.  
     *
     * @return  A new selector
     *
     * @throws  IOException
     *          If an I/O error occurs
     */
    public static Selector open() throws IOException {
        return SelectorProvider.provider().openSelector();
    }

    /* more */
    
}

实际上只是一层抽象，具体调用了SelectorProvider来提供和打开Selector实例。

Selector.select()

多路复用的核心功能，选出可进行I/O的通道们的键集。

public abstract class Selector implements Closeable {

    /* more */
    
/**
     * Selects a set of keys whose corresponding channels are ready for I/O
     * operations.
     *
     *  This method performs a blocking selection
     * operation.  It returns only after at least one channel is selected,
     * this selector's {@link #wakeup wakeup} method is invoked, the current
     * thread is interrupted, or the given timeout period expires, whichever
     * comes first.
     *
     * 
 This method does not offer real-time guarantees: It schedules the
     * timeout as if by invoking the {@link Object#wait(long)} method. 
     *
     * @param  timeout  If positive, block for up to {@code timeout}
     *                  milliseconds, more or less, while waiting for a
     *                  channel to become ready; if zero, block indefinitely;
     *                  must not be negative
     *
     * @return  The number of keys, possibly zero,
     *          whose ready-operation sets were updated
     *
     * @throws  IOException
     *          If an I/O error occurs
     *
     * @throws  ClosedSelectorException
     *          If this selector is closed
     *
     * @throws  IllegalArgumentException
     *          If the value of the timeout argument is negative
     */
    public abstract int select(long timeout) throws IOException;
    
    /* more */
    
}

该方法在Selector抽象类定义，但具体实现位于作为其子类的SelectorImpl实现类中。

SelectorImpl

/**
 * Base Selector implementation class.
 */

abstract class SelectorImpl
    extends AbstractSelector
{
    
    /* more */
    
/**
     * Selects the keys for channels that are ready for I/O operations.
     *
     * @param action  the action to perform, can be null
     * @param timeout timeout in milliseconds to wait, 0 to not wait, -1 to
     *                wait indefinitely
     */
    protected abstract int doSelect(Consumer action, long timeout)
        throws IOException;

    private int lockAndDoSelect(Consumer action, long timeout)
        throws IOException
    {
        synchronized (this) {
            ensureOpen();
            if (inSelect)
                throw new IllegalStateException("select in progress");
            inSelect = true;
            try {
                synchronized (publicSelectedKeys) {
                    return doSelect(action, timeout);
                }
            } finally {
                inSelect = false;
            }
        }
    }

    @Override
    public final int select(long timeout) throws IOException {
        if (timeout < 0)
            throw new IllegalArgumentException("Negative timeout");
        return lockAndDoSelect(null, (timeout == 0) ? -1 : timeout);
    }

    @Override
    public final int select() throws IOException {
        return lockAndDoSelect(null, -1);
    }

    /* more */
    
}

SelectorImpl的select方法调用了lockAndDoSelect方法。传入的参数表示不执行任何操作，且默认持续等待。

在lockAndDoSelect方法中，用synchronized关键字保护当前Selector对象，实现并发同步。内部也通过isSelect标记来防止并发select操作，实际执行的方法是doSelect方法，该方法在SelectorImpl类中被定义，但没有实现。具体实现取决于其子类，即实现层的实现。

SelectorProvider

SelectorPrivider是一个抽象类。

SelectorProvider.provider()

public abstract class SelectorProvider {

    /* more */

/**
     * Returns the system-wide default selector provider for this invocation of
     * the Java virtual machine.
     *
     *  The first invocation of this method locates the default provider
     * object as follows: 
     *
     * 
     *
     *    If the system property
     *   {@code java.nio.channels.spi.SelectorProvider} is defined then it is
     *   taken to be the fully-qualified name of a concrete provider class.
     *   The class is loaded and instantiated; if this process fails then an
     *   unspecified error is thrown.  
     *
     *    If a provider class has been installed in a jar file that is
     *   visible to the system class loader, and that jar file contains a
     *   provider-configuration file named
     *   {@code java.nio.channels.spi.SelectorProvider} in the resource
     *   directory {@code META-INF/services}, then the first class name
     *   specified in that file is taken.  The class is loaded and
     *   instantiated; if this process fails then an unspecified error is
     *   thrown.  
     *
     *    Finally, if no provider has been specified by any of the above
     *   means then the system-default provider class is instantiated and the
     *   result is returned.  
     *
     * 
     *
     *  Subsequent invocations of this method return the provider that was
     * returned by the first invocation.  
     *
     * @return  The system-wide default selector provider
     */
    public static SelectorProvider provider() {
        synchronized (lock) {
            if (provider != null)
                return provider;
            return AccessController.doPrivileged(
                new PrivilegedAction<>() {
                    public SelectorProvider run() {
                            if (loadProviderFromProperty())
                                return provider;
                            if (loadProviderAsService())
                                return provider;
                            provider = sun.nio.ch.DefaultSelectorProvider.create();
                            return provider;
                        }
                    });
        }
    }

    /* more */
    
}

这个provider()方法是一个synchronized同步锁保护的单例模式，返回SelectorProvider类型的实例。

具体地，当没有实例时，需要创建实例。

创建实例通过AccessController来执行特权行为。

根据官方文档，AccessController被用于控制操作的权限和决策。

The AccessController class is used for access control operations anddecisions. More specifically, the AccessController class is used forthree purposes:
to decide whether an access to a critical system resource is to beallowed or denied, based on the security policy currently ineffect,
to mark code as being "privileged", thus affecting subsequent accessdeterminations, and
to obtain a "snapshot" of the current calling context soaccess-control decisions from a different context can be made withrespect to the saved context.

这个AccessController起到三种作用：

检查权限：决定对关键系统资源的访问是否应该批准；
授予权限：把代码标记为特权代码，以便执行后续操作；
保存快照：保存当前调用上下文，以便做来自其它上下文的访问控制决策的时候能够考虑到已保存的上下文。

具体地，在此处，AccessController.doPrivileged(...)方法起到的是第二个作用，授予权限，执行特权代码：

Performs the specified PrivilegedAction with privileges enabled.
The action is performed with all of the permissions possessed by thecaller's protection domain.

该方法的输入参数是一个实现了PrivilegedAction接口的匿名类，该匿名类实现了接口的run()方法。该方法依靠外层提供的特权权限，来实例化一个SelectorProvider。实例化的过程分三种优先级：

loadProviderFromProperty()
loadProviderAsService()
provider = sun.nio.ch.DefaultSelectorProvider.create()

第一优先级`loadProviderFromProperty`

第一优先级通过系统检查java.nio.channels.spi.SelectorProvider是否存在，如果存在则加载，反之，则返回false。

Service-provider classes for the java.nio.channels package.

java.nio.channels.spi包提供了一批ServiceProvider的类。

public abstract class SelectorProvider {

    /* more */

private static boolean loadProviderFromProperty() {
        String cn = System.getProperty("java.nio.channels.spi.SelectorProvider");
        if (cn == null)
            return false;
        try {
            @SuppressWarnings("deprecation")
            Object tmp = Class.forName(cn, true,
                                       ClassLoader.getSystemClassLoader()).newInstance();
            provider = (SelectorProvider)tmp;
            return true;
        } catch (ClassNotFoundException x) {
            throw new ServiceConfigurationError(null, x);
        } catch (IllegalAccessException x) {
            throw new ServiceConfigurationError(null, x);
        } catch (InstantiationException x) {
            throw new ServiceConfigurationError(null, x);
        } catch (SecurityException x) {
            throw new ServiceConfigurationError(null, x);
        }
    }
    
    /* more */ 
    
}

该方法首先读取检查系统属性中，键java.nio.channels.spi.SelectorProvider是否有设置值。如果没有，则返回false，如果有，用这个值加载SelectorProvider类。

该方法通过Class.forName方法，指定通过系统类加载器在运行时动态加载系统属性中设置的SelectorProvider类（如指定）。

第二优先级`loadProviderAsService`

如果第一优先级所需的java.nio.channels.spi.SelectorProvider不存在，则需要启动第二优先级的加载工作。

如果META-INF/services中，存放了java.nio.channels.spi.SelectorProvider的jar文件，则通过系统类加载器加载该服务。

public abstract class SelectorProvider {

    /* more */

    private static boolean loadProviderAsService() {

        ServiceLoader sl =
            ServiceLoader.load(SelectorProvider.class,
                               ClassLoader.getSystemClassLoader());
        Iterator i = sl.iterator();
        for (;;) {
            try {
                if (!i.hasNext())
                    return false;
                provider = i.next();
                return true;
            } catch (ServiceConfigurationError sce) {
                if (sce.getCause() instanceof SecurityException) {
                    // Ignore the security exception, try the next provider
                    continue;
                }
                throw sce;
            }
        }
    }
    
    /* more */ 
    
}

该方法通过ServiceLoader来加载服务，选取可见的第一个SelectorProvider实例。

最终优先级`sun.nio.ch.DefaultSelectorProvider`

如果上述SelectorProvider都不存在，就会加载sun.nio.ch.DefaultSelectorProvider作为最终选择。

实际运行中，如果没有实现和配置前两种，默认会启用该最终优先级。

DefaultSelectorProvider的对外提供统一的接口，内部仅仅是完成对实现类的实例化，而具体实例化什么类，取决于JDK的操作系统版本。

DefaultSelectorProvider

DefaultSelector.create()

具体地，该sun.nio.ch.DefaultSelectorProvider对外提供一致接口，其create方法实际上仅仅是一层封装，只是实现了一个new实例化操作，但不同操作系统平台的JDK的内部实现不同：

在WindowsJDK11中，其实例化的是sun.nio.ch.WindowsSelectorProvider类。
在LinuxJDK11中，其实例化的是sun.nio.ch.EPollSelectorProvider类。

Windows JDK11：

/**
 * Creates this platform's default SelectorProvider
 */

public class DefaultSelectorProvider {

    /**
     * Prevent instantiation.
     */
    private DefaultSelectorProvider() { }

    /**
     * Returns the default SelectorProvider.
     */
    public static SelectorProvider create() {
        return new sun.nio.ch.WindowsSelectorProvider();
    }

}

Linux JDK11:

/**
 * Creates this platform's default SelectorProvider
 */

public class DefaultSelectorProvider {

    /**
     * Prevent instantiation.
     */
    private DefaultSelectorProvider() { }

    /**
     * Returns the default SelectorProvider.
     */
    public static SelectorProvider create() {
        return new EPollSelectorProvider();
    }

}

WindowsSelectorProvider与EPollSelectorProvider

这一层都继承自SelectorProviderImpl抽象类，实际上也没有实现什么特别的功能逻辑，只是调用对应的SelectorImpl实现类。

这一层实现了从SelectorProvider到SelectorImpl的交互。

具体到每种SelectorImpl是如何实现的，在下一节实现层具体分析。

WindowsSelectorProvider.openSelector()

/*
 * SelectorProvider for sun.nio.ch.WindowsSelectorImpl.
 *
 * @author Konstantin Kladko
 * @since 1.4
 */

public class WindowsSelectorProvider extends SelectorProviderImpl {

    public AbstractSelector openSelector() throws IOException {
        return new WindowsSelectorImpl(this);
    }
}

EPollSelectorProvider.openSelector()

public class EPollSelectorProvider
    extends SelectorProviderImpl
{
    public AbstractSelector openSelector() throws IOException {
        return new EPollSelectorImpl(this);
    }
    
    public Channel inheritedChannel() throws IOException {
        return InheritedChannel.getChannel();
    }
}

继承关系小结

Selector系列

abstract class Selector
- abstract class AbstractSelector extends Selector
  - abstract class SelectorImpl exntends AbstractSelector
    - class WindowsSelectorImpl extends SelectorImpl
    - class EPollSelectorImpl extends SelectorImpl

SelectorProvider系列

abstract class SelectorProvider
- abstract class SelectorProviderImpl extends SelectorProvider
  - class WindowsSelectorProvider extends SelectorProviderImpl
  - class EPollSelectorProvider extends SelectorProviderImpl
class DefaultSelectorProvider

实现层

Windows JDK11的实现

在WindowsJDK11中，其实例化的是sun.nio.ch.WindowsSelectorProvider类返回给上层使用。

WindowsSelectorProvider

简单回顾一下，WindowsSelectorProvider实现了从对外的SelectorProvider到具体的WindowsSelectorImpl实现类的转接。

该类继承自SelectorProviderImpl抽象类，是对其的具体实现，供外部抽象层调用，实现的只是转接调用，调用WindowsSelectorImpl这一个实现类。

public class WindowsSelectorProvider extends SelectorProviderImpl {

    public AbstractSelector openSelector() throws IOException {
        return new WindowsSelectorImpl(this);
    }
}

WindowsSelectorImpl

概括地来讲，WindowsSelectorImpl的底层实现是通过JNI接口调用地本地poll方法，但是不是简单调用，而是进行了多线程的改进。

为什么要采用多线程呢？因为poll方法本身可以处理的文件描述符（filedescriptor）数量是有限的，一般和select方法类似，不超过1024个。实际的应用场景中，需要并发处理的文件描述符是完全有可能超过这个上限的。WindowsJDK11中的实现则采用多线程对poll进行改进，一个线程能处理的文件描述符数量是有限的，那么如果文件描述符数量很多，用多个线程分摊处理不就好了么。

主要数据结构

类型	变量	说明
`SelectionKeyImpl[]`	channelArray	The list of SelectableChannels serviced by this Selector. Every modMAX_SELECTABLE_FDS entry is bogus, to align this array with the pollarray, where the corresponding entry is occupied by thewakeupSocket
`PollArrayWrapper`	pollWrapper	The global native poll array holds file decriptors and eventmasks
`List`	threads	A list of helper threads for select.
`Pipe`	wakeupPipe	Pipe used as a wakeup object.
`FdMap`	fdMap	Maps file descriptors to their indices in pollArray
`SubSelector`	subSelector	SubSelector for the main thread
`Object`	interruptLock	Lock for interrupt triggering and clearing
`Object`	updateLock	pending new registrations/updates, queued by implRegister andsetEventOps
`Deque`	newKeys
`Deque`	updateKeys

WindowsSelectorImpl.doSelect()

Windows平台JDK11是如何select出对应状态的SocketChannel的呢？

抽象层的Selector.select()调用由SelectorImpl.select()实现，而该实现主要是调用了SelectorImpl.lockAndDoSelect()，其中调用SelectorImpl.doSelect()，该方法在Windows平台的JDK11中由WindowsSelectorImpl.doSelect()具体实现。

/**
 * A multi-threaded implementation of Selector for Windows.
 *
 * @author Konstantin Kladko
 * @author Mark Reinhold
 */

class WindowsSelectorImpl extends SelectorImpl {
    
    /* more */
    
    @Override
    protected int doSelect(Consumer action, long timeout)
        throws IOException
    {
        assert Thread.holdsLock(this);
        this.timeout = timeout; // set selector timeout
        processUpdateQueue();
        processDeregisterQueue();
        if (interruptTriggered) {
            resetWakeupSocket();
            return 0;
        }
        // Calculate number of helper threads needed for poll. If necessary
        // threads are created here and start waiting on startLock
        adjustThreadsCount();
        finishLock.reset(); // reset finishLock
        // Wakeup helper threads, waiting on startLock, so they start polling.
        // Redundant threads will exit here after wakeup.
        startLock.startThreads();
        // do polling in the main thread. Main thread is responsible for
        // first MAX_SELECTABLE_FDS entries in pollArray.
        try {
            begin();
            try {
                subSelector.poll();
            } catch (IOException e) {
                finishLock.setException(e); // Save this exception
            }
            // Main thread is out of poll(). Wakeup others and wait for them
            if (threads.size() > 0)
                finishLock.waitForHelperThreads();
          } finally {
              end();
          }
        // Done with poll(). Set wakeupSocket to nonsignaled  for the next run.
        finishLock.checkForException();
        processDeregisterQueue();
        int updated = updateSelectedKeys(action);
        // Done with poll(). Set wakeupSocket to nonsignaled  for the next run.
        resetWakeupSocket();
        return updated;
    }
    
    /* more */
    
}

调用WindowsSelectorImpl.doSelect()方法，执行的流程主要为：

首先进行了一些状态更新，处理新的注册和修改、被取消的键集。
随后计算所需线程数量，准备多线程poll操作所需的辅助线程（helperthreads）。
1. 如果主线程就足够处理当前这么多的描述符了，那就不需要再启动辅助线程了；
2. 如果主线程没法独自处理大量的描述符，那就需要创建并启动辅助线程来帮忙。
主线程本身当然是要承担poll的工作的，即subSelector.poll()，这是主线程自己调用自己的subSelector在执行poll操作。
如果有辅助线程帮忙，即threads.size()>0的情况，那么就需要通过finishLock.waitForHelperThreads()的同步操作来等待辅助线程们完成他们的工作。
至此，poll的处理就完成了，此后进行一些收尾的检查，状态的更新，即可返回本次doSelect操作更新过的键的数量。

WindowsSelectorImpl.SelectThreadWindowsSelectorImpl.SelectThread.run()

辅助线程是WindowsSelectorImpl.SelectThread类的实例，线程类最核心的内容就是其实现的run方法。

// Represents a helper thread used for select.
private final class SelectThread extends Thread {
    private final int index; // index of this thread
    final SubSelector subSelector;
    private long lastRun = 0; // last run number
    private volatile boolean zombie;
    // Creates a new thread
    private SelectThread(int i) {
        super(null, null, "SelectorHelper", 0, false);
        this.index = i;
        this.subSelector = new SubSelector(i);
        //make sure we wait for next round of poll
        this.lastRun = startLock.runsCounter;
    }
    void makeZombie() {
        zombie = true;
    }
    boolean isZombie() {
        return zombie;
    }
    public void run() {
        while (true) { // poll loop
            // wait for the start of poll. If this thread has become
            // redundant, then exit.
            if (startLock.waitForStart(this)) {
                subSelector.freeFDSetBuffer();
                return;
            }
            // call poll()
            try {
                subSelector.poll(index);
            } catch (IOException e) {
                // Save this exception and let other threads finish.
                finishLock.setException(e);
            }
            // notify main thread, that this thread has finished, and
            // wakeup others, if this thread is the first to finish.
            finishLock.threadFinished();
        }
    }
}

不难发现，辅助线程的线程类的实现中，其执行的核心操作其实就是调用了subSelector.poll(index)，以此对本线程负责的文件描述符进行poll操作。

那这个subSelector又是怎么做的呢？

WindowsSelectorImpl.SubSelector

前面介绍了主线程和辅助线程，两者都有一个subSelector实例，他们在执行poll操作的时候都是调用的subSelector.poll()。

WindowsSelectorImpl.SubSelector.poll()

private final class SubSelector {
    private final int pollArrayIndex; // starting index in pollArray to poll
    // These arrays will hold result of native select().
    // The first element of each array is the number of selected sockets.
    // Other elements are file descriptors of selected sockets.
    private final int[] readFds = new int [MAX_SELECTABLE_FDS + 1];
    private final int[] writeFds = new int [MAX_SELECTABLE_FDS + 1];
    private final int[] exceptFds = new int [MAX_SELECTABLE_FDS + 1];
    // Buffer for readfds, writefds and exceptfds structs that are passed
    // to native select().
    private final long fdsBuffer = unsafe.allocateMemory(SIZEOF_FD_SET * 3);

    private SubSelector() {
        this.pollArrayIndex = 0; // main thread
    }

    private SubSelector(int threadIndex) { // helper threads
        this.pollArrayIndex = (threadIndex + 1) * MAX_SELECTABLE_FDS;
    }

    private int poll() throws IOException{ // poll for the main thread
        return poll0(pollWrapper.pollArrayAddress,
                     Math.min(totalChannels, MAX_SELECTABLE_FDS),
                     readFds, writeFds, exceptFds, timeout, fdsBuffer);
    }

    private int poll(int index) throws IOException {
        // poll for helper threads
        return  poll0(pollWrapper.pollArrayAddress +
                 (pollArrayIndex * PollArrayWrapper.SIZE_POLLFD),
                 Math.min(MAX_SELECTABLE_FDS,
                         totalChannels - (index + 1) * MAX_SELECTABLE_FDS),
                 readFds, writeFds, exceptFds, timeout, fdsBuffer);
    }

    private native int poll0(long pollAddress, int numfds,
         int[] readFds, int[] writeFds, int[] exceptFds, long timeout, long fdsBuffer);
    
    /* more */
    
}

查看源码可知，SubSelector的poll()和poll(index)方法实际上都是对poll0()方法的一层适配封装，实际上调用的就是poll0()。

WindowsSelectorImpl.SubSelector.poll0()

从上面的源码可以看到，poll0方法并不是在Java中实现的，而是通过JNI调用的本地实现。

Linux JDK11的实现

在LinuxJDK11中，其实例化的是sun.nio.ch.EPollSelectorProvider类返回给上层使用。

EPollSelectorProvider

类似的，LinuxJDK11是通过EPollSelectorProvider提供外部访问接口的。

public class EPollSelectorProvider
    extends SelectorProviderImpl
{
    public AbstractSelector openSelector() throws IOException {
        return new EPollSelectorImpl(this);
    }

    public Channel inheritedChannel() throws IOException {
        return InheritedChannel.getChannel();
    }
}

该openSelector方法主要是通过EPollSelectorImpl实现类来实例化一个EPollSelector并返回。

EPollSelectorImpl

EPollSelectorImpl.doSelector

Linux平台JDK11是如何select出对应状态的SocketChannel的呢？

实质上是调用的EPoll.wait方法来返回已经就绪的文件描述符数量。

/**
 * Linux epoll based Selector implementation
 */

class EPollSelectorImpl extends SelectorImpl {
    
    /* more */
    
@Override
    protected int doSelect(Consumer action, long timeout)
        throws IOException
    {
        assert Thread.holdsLock(this);

        // epoll_wait timeout is int
        int to = (int) Math.min(timeout, Integer.MAX_VALUE);
        boolean blocking = (to != 0);
        boolean timedPoll = (to > 0);

        int numEntries;
        processUpdateQueue();
        processDeregisterQueue();
        try {
            begin(blocking);

            do {
                long startTime = timedPoll ? System.nanoTime() : 0;
                numEntries = EPoll.wait(epfd, pollArrayAddress, NUM_EPOLLEVENTS, to);
                if (numEntries == IOStatus.INTERRUPTED && timedPoll) {
                    // timed poll interrupted so need to adjust timeout
                    long adjust = System.nanoTime() - startTime;
                    to -= TimeUnit.MILLISECONDS.convert(adjust, TimeUnit.NANOSECONDS);
                    if (to <= 0) {
                        // timeout expired so no retry
                        numEntries = 0;
                    }
                }
            } while (numEntries == IOStatus.INTERRUPTED);
            assert IOStatus.check(numEntries);

        } finally {
            end(blocking);
        }
        processDeregisterQueue();
        return processEvents(numEntries, action);
    }
    
    /* more */
}

具体地，在EPollSelectorImpl.doSelect方法中，和WindowsSelectorImpl中的实现类似：

首先都有必要检查和更新状态，处理修改队列和取消注册队列；
通过EPoll.wait方法来获取处于就绪状态的I/O文件描述符数量；
最后更新状态，返回本次doSelect更新过的键的数量。

EPoll

EPoll作为Linux内核提供的多路复用器，JDK11选择通过JNI接口来调用其功能。

JDK11中EPoll类是一个简易的包装类，epoll的实现不由JDK负责。

/**
 * Provides access to the Linux epoll facility.
 */

class EPoll {
    private EPoll() { }

    private static final Unsafe unsafe = Unsafe.getUnsafe();

    /**
     * typedef union epoll_data {
     *     void *ptr;
     *     int fd;
     *     __uint32_t u32;
     *     __uint64_t u64;
     *  } epoll_data_t;
     *
     * struct epoll_event {
     *     __uint32_t events;
     *     epoll_data_t data;
     * }
     */
    private static final int SIZEOF_EPOLLEVENT   = eventSize();
    private static final int OFFSETOF_EVENTS     = eventsOffset();
    private static final int OFFSETOF_FD         = dataOffset();

    // opcodes
    static final int EPOLL_CTL_ADD  = 1;
    static final int EPOLL_CTL_DEL  = 2;
    static final int EPOLL_CTL_MOD  = 3;

    // events
    static final int EPOLLIN   = 0x1;
    static final int EPOLLOUT  = 0x4;

    // flags
    static final int EPOLLONESHOT   = (1 << 30);

    /**
     * Allocates a poll array to handle up to {@code count} events.
     */
    static long allocatePollArray(int count) {
        return unsafe.allocateMemory(count * SIZEOF_EPOLLEVENT);
    }

    /**
     * Free a poll array
     */
    static void freePollArray(long address) {
        unsafe.freeMemory(address);
    }

    /**
     * Returns event[i];
     */
    static long getEvent(long address, int i) {
        return address + (SIZEOF_EPOLLEVENT*i);
    }

    /**
     * Returns event->data.fd
     */
    static int getDescriptor(long eventAddress) {
        return unsafe.getInt(eventAddress + OFFSETOF_FD);
    }

    /**
     * Returns event->events
     */
    static int getEvents(long eventAddress) {
        return unsafe.getInt(eventAddress + OFFSETOF_EVENTS);
    }

    // -- Native methods --

    private static native int eventSize();

    private static native int eventsOffset();

    private static native int dataOffset();

    static native int create() throws IOException;

    static native int ctl(int epfd, int opcode, int fd, int events);

    static native int wait(int epfd, long pollAddress, int numfds, int timeout)
        throws IOException;

    static {
        IOUtil.load();
    }
}

EPoll.wait

1 2	static native int wait(int epfd, long pollAddress, int numfds, int timeout) throws IOException;

该方法调用的应该是Linux中的epoll_wait系统调用。根据man epoll_wait查阅的Linux手册，具体说明：

The epoll_wait() system call waits for events on the epoll(7)instance referred to by the file descriptor epfd. The memory areapointed to by events will contain the events that will be available forthe caller. Up to maxevents are returned by epoll_wait(). The maxeventsargument must be greater than zero.
The timeout argument specifies the number of milliseconds thatepoll_wait() will block.

也就是说，JDK调用的EPoll.wait方法会在timeout时间内阻塞等待epoll的文件描述符epfd所引用的事件发生，发生后，其返回的结果是代表事件数量的整数。

总结

从表层的Selector查到底层的WindowsSelectorImpl与EPollImpl，经过一层层抽丝剥茧，可以看到JDK在设计上清晰地体现着将抽象与实现分离的“依赖倒置原则”——顶层调用不应该依赖于底层实现，底层实现也不应该针对于顶层调用，双方都应该依赖于抽象。

考虑到Linux内核已经提供了好用的epoll多路复用，足以处理大规模的并发连接，JDK11通过JNI接口对epoll相关的系统调用进行本地调用即可，其实现也显得相对简单。Windows并未提供Epoll这样的多路复用模型，为解决poll存在的并发连接数量有限的问题，JDK11通过分而治之的分治思想，拉辅助线程来分担任务，通过实现动态多线程poll巧妙地实现了处理大量并发连接的能力。

最终，无论是Windows还是Linux，要想研究多路复用机制的更深层的实现原理，还是需要研究操作系统层级的实现原理。

HTTP服务器压力测试

2020-07-30T03:40:50.000Z

对HearyHTTPd进行压力测试。网上的压力测试工具很多，我测试了Apachebenchmark和WebBench两款压力测试工具，并在台式机和实验室计算服务器上分别进行了压力测试，QPS分别约6400+、21000+和30000+。

HTTP服务器压力测试

1 Apache benchmark

1.1 简介

Apachebenchmark
ab - Apache HTTP server benchmarking tool
ab is a tool for benchmarking your Apache HypertextTransfer Protocol (HTTP) server. It is designed to give you animpression of how your current Apache installation performs. Thisespecially shows you how many requests per second your Apacheinstallation is capable of serving.

Apachebenchmark是一款Apache提供的HTTP服务器压力测试工具，随Apache安装。

1.2 安装

1	sudo apt install apache2-utils

1.3 测试

我测试了我的HearyHTTPd，用1000个客户机并发请求10万次。

在我的实验室台式机上进行了测试：

CPU：Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
内存：单通道 8GB DDR4 2400Mhz
通过WSL中的Ubuntu来测试Windows环境JRE下运行的hhttpd。

jyshen@JYSHEN-WORKPC:~$ ab -n 100000 -c 1000 http://localhost:8080/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:
Server Hostname:        localhost
Server Port:            8080

Document Path:          /
Document Length:        0 bytes

Concurrency Level:      1000
Time taken for tests:   15.595 seconds
Complete requests:      100000
Failed requests:        0
Total transferred:      1300000 bytes
HTML transferred:       0 bytes
Requests per second:    6412.45 [#/sec] (mean)
Time per request:       155.947 [ms] (mean)
Time per request:       0.156 [ms] (mean, across all concurrent requests)
Transfer rate:          81.41 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       33   78  20.6     78     143
Processing:    29   77  20.8     77     147
Waiting:        4   45  16.8     41     146
Total:         72  155   7.7    154     195

Percentage of the requests served within a certain time (ms)
  50%    154
  66%    155
  75%    156
  80%    156
  90%    163
  95%    169
  98%    173
  99%    184
 100%    195 (longest request)

测得结果平均每秒能处理6412个请求。
多次测试能稳定在6000以上。
测试中观察内存和CPU消耗无异常情况。
其他影响因素：实验过程中，Windows Defender的Antimalware ServiceExecutable进程的CPU使用率明显上升，约7~10%。

另外，我在实验室的新服务器上进行了测试：

CPU：2颗Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz
内存：512G（16根32G） ECC DDR4 2666Mhz

(base) sjy@h3c-UniServer-R5200-G3:~$ ab -n 100000 -c 1000 http://localhost:8080/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:        
Server Hostname:        localhost
Server Port:            8080

Document Path:          /
Document Length:        0 bytes

Concurrency Level:      1000
Time taken for tests:   4.753 seconds
Complete requests:      100000
Failed requests:        0
Total transferred:      1300000 bytes
HTML transferred:       0 bytes
Requests per second:    21041.56 [#/sec] (mean)
Time per request:       47.525 [ms] (mean)
Time per request:       0.048 [ms] (mean, across all concurrent requests)
Transfer rate:          267.13 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        7   24  63.7     20    1052
Processing:     6   23   6.1     24      49
Waiting:        5   17   5.1     16      44
Total:         21   47  64.3     46    1084

Percentage of the requests served within a certain time (ms)
  50%     46
  66%     47
  75%     48
  80%     49
  90%     51
  95%     53
  98%     56
  99%     62
 100%   1084 (longest request)

测得结果平均每秒能处理21041个请求。

我还在另一台计算服务器上进行了测试：

CPU：2颗Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
内存：128G（4根32G）ECC DDR4 2666Mhz

jyshen@ubuntu:~$ ab -c 1000 -n 100000 http://127.0.0.1:8080/
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:        
Server Hostname:        127.0.0.1
Server Port:            8080

Document Path:          /
Document Length:        152 bytes

Concurrency Level:      1000
Time taken for tests:   3.296 seconds
Complete requests:      100000
Failed requests:        0
Total transferred:      21500000 bytes
HTML transferred:       15200000 bytes
Requests per second:    30338.50 [#/sec] (mean)
Time per request:       32.961 [ms] (mean)
Time per request:       0.033 [ms] (mean, across all concurrent requests)
Transfer rate:          6369.90 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   16  63.0     13    1015
Processing:     3   17   5.1     16      56
Waiting:        3   13   4.9     11      55
Total:          6   33  63.5     31    1042

Percentage of the requests served within a certain time (ms)
  50%     31
  66%     32
  75%     33
  80%     34
  90%     35
  95%     36
  98%     40
  99%     44
 100%   1042 (longest request)

测的结果平均每秒能处理30338个请求。

2 webBench

2.1 简介

GitHub地址：WebBench

Webbench是RadimKolar在1997年写的一个在linux下使用的非常简单的网站压测工具。它使用fork()模拟多个客户端同时访问我们设定的URL，测试网站在压力下工作的性能，最多可以模拟3万个并发连接去测试网站的负载能力。官网地址:http://home.tiscali.cz/~cz210552/webbench.html

2.2 安装

# prerequisite
sudo apt update
sudo apt install build-essential

# install webbench
wget http://home.tiscali.cz/~cz210552/distfiles/webbench-1.5.tar.gz
tar -zxvf webbench-1.5.tar.gz
cd webbench-1.5/
sudo make
sudo make install

2.3 测试

运行1万个并发client，1秒钟。

在我的实验室台式机上测试：

jyshen@JYSHEN-WORKPC:~$ webbench -c 10000 -t 1 http://localhost:8080/
Webbench - Simple Web Benchmark 1.5
Copyright (c) Radim Kolar 1997-2004, GPL Open Source Software.

Benchmarking: GET http://localhost:8080/
10000 clients, running 1 sec.

Speed=39851336 pages/min, 8545485 bytes/sec.
Requests: 664189 susceed, 0 failed.

测得66万多个并发请求均成功处理

实验室的服务器无法fork创建出相同数量的1万子进程，提示：

(base) sjy@h3c-UniServer-R5200-G3:~/HearyHTTPd/webbench-1.5$ webbench -c 10000 -t 1 http://localhost:8080/
Webbench - Simple Web Benchmark 1.5
Copyright (c) Radim Kolar 1997-2004, GPL Open Source Software.

Benchmarking: GET http://localhost:8080/
10000 clients, running 1 sec.
problems forking worker no. 8841
fork failed.: Resource temporarily unavailable

通过htop查了下，服务器上还有其他同学的不少计算程序在进行。暂时无法对比。

ThreadPoolExecutor - 从JDK11源码理解Java线程池原理

2020-07-28T13:10:29.000Z

在开发HearyHTTPd的过程中，为了有效利用多线程处理并发请求，我使用了Java的线程池机制。我查阅了JDK11中的线程池实现源码，本文对其原理进行进一步的梳理。

ThreadPoolExecutor- 从JDK11源码理解Java线程池原理

1 表层——Executors

JDK对外提供Executors类的三个静态方法供调用，可以快速生成线程池：

1.1 newSingleThreadExecutor

1	public static ExecutorService newSingleThreadExecutor()

退化为只包含一个线程的“线程池”。

1.2 newFixedThreadPool

1	public static ExecutorService newFixedThreadPool (int nThreads)

包含固定数量线程的线程池。

1.3 newCachedThreadPool

1	public static ExecutorService newCachedThreadPool()

按需创建线程的线程池。

2 深一层——ThreadPoolExecutor

实质上，以上三个对外的静态方法，本质上都实例化了同一个类型，即：ThreadPoolExecutor，该类继承自抽象类java.util.concurrent.AbstractExecutorService，该抽象类实现了ExecutorService接口，该接口又继承自Executor接口。其中，ExecutorService就是一般外部调用线程池实例的抽象接口。

ThreadPoolExecutor提供构造函数：

/**
 * Creates a new {@code ThreadPoolExecutor} with the given initial
 * parameters, the default thread factory and the default rejected
 * execution handler.
 *
 * It may be more convenient to use one of the {@link Executors}
 * factory methods instead of this general purpose constructor.
 *
 * @param corePoolSize the number of threads to keep in the pool, even
 *        if they are idle, unless {@code allowCoreThreadTimeOut} is set
 * @param maximumPoolSize the maximum number of threads to allow in the
 *        pool
 * @param keepAliveTime when the number of threads is greater than
 *        the core, this is the maximum time that excess idle threads
 *        will wait for new tasks before terminating.
 * @param unit the time unit for the {@code keepAliveTime} argument
 * @param workQueue the queue to use for holding tasks before they are
 *        executed.  This queue will hold only the {@code Runnable}
 *        tasks submitted by the {@code execute} method.
 * @throws IllegalArgumentException if one of the following holds:

 *         {@code corePoolSize < 0}

 *         {@code keepAliveTime < 0}

 *         {@code maximumPoolSize <= 0}

 *         {@code maximumPoolSize < corePoolSize}
 * @throws NullPointerException if {@code workQueue} is null
 */
public ThreadPoolExecutor(int corePoolSize,
                          int maximumPoolSize,
                          long keepAliveTime,
                          TimeUnit unit,
                          BlockingQueue workQueue) {
    this(corePoolSize, maximumPoolSize, keepAliveTime, unit, workQueue,
         Executors.defaultThreadFactory(), defaultHandler);
}

可以设置线程池的一系列参数：

核心线程池尺寸：线程池至少保有多少线程；
最大线程池尺寸：线程池最多能创建多少线程；
保活时间：线程数量如果超过核心线程数了，最多允许空闲多久，超过即终止线程；
时间单位：保活时间的时间单位，可以是纳秒、微秒、毫秒、秒、分钟、小时、天；
工作队列：提交给线程池的任务在执行前，会先放到工作队列中。

2.1 newSingleThreadExecutor

/**
 * Creates an Executor that uses a single worker thread operating
 * off an unbounded queue. (Note however that if this single
 * thread terminates due to a failure during execution prior to
 * shutdown, a new one will take its place if needed to execute
 * subsequent tasks.)  Tasks are guaranteed to execute
 * sequentially, and no more than one task will be active at any
 * given time. Unlike the otherwise equivalent
 * {@code newFixedThreadPool(1)} the returned executor is
 * guaranteed not to be reconfigurable to use additional threads.
 *
 * @return the newly created single-threaded Executor
 */
public static ExecutorService newSingleThreadExecutor() {
    return new FinalizableDelegatedExecutorService
        (new ThreadPoolExecutor(1, 1,
                                0L, TimeUnit.MILLISECONDS,
                                new LinkedBlockingQueue()));
}

newSingleThreadExecutor具体设置参数为：

核心线程数和最大线程数都是1，这保证了线程池中有且只有一个线程。
因为只有1个线程，所以无所谓超时终止，因此保活时间为0。
提交给线程池的线程会放到一个LinkedBlockingQueue的实例中。
- 这是一个默认无界的阻塞队列（可选有界以控制内存消耗）。

外面套了一层FinalizableDelegatedExecutorService实际上是该Executors类定义的一个内部静态类：

private static class FinalizableDelegatedExecutorService
        extends DelegatedExecutorService {
    FinalizableDelegatedExecutorService(ExecutorService executor) {
        super(executor);
    }
    @SuppressWarnings("deprecation")
    protected void finalize() {
        super.shutdown();
    }
}

只是实现了finalize方法，负责关闭线程池。

该类进一步继承自另一个内部静态类DelegatedExecutorService，这是一个包装类，用于控制对外的提供的方法：

/**
 * A wrapper class that exposes only the ExecutorService methods
 * of an ExecutorService implementation.
 */
private static class DelegatedExecutorService
        implements ExecutorService {
    // ...
}

2.2 newFixedThreadPool

/**
 * Creates a thread pool that reuses a fixed number of threads
 * operating off a shared unbounded queue.  At any point, at most
 * {@code nThreads} threads will be active processing tasks.
 * If additional tasks are submitted when all threads are active,
 * they will wait in the queue until a thread is available.
 * If any thread terminates due to a failure during execution
 * prior to shutdown, a new one will take its place if needed to
 * execute subsequent tasks.  The threads in the pool will exist
 * until it is explicitly {@link ExecutorService#shutdown shutdown}.
 *
 * @param nThreads the number of threads in the pool
 * @return the newly created thread pool
 * @throws IllegalArgumentException if {@code nThreads <= 0}
 */
public static ExecutorService newFixedThreadPool(int nThreads) {
    return new ThreadPoolExecutor(nThreads, nThreads,
                                  0L, TimeUnit.MILLISECONDS,
                                  new LinkedBlockingQueue());
}

newFixedThreadPool具体设置参数为：

核心线程数和最大线程数都是输入参数nThreads，这保证了线程池中有且只有nThreads个线程。
因为只有nThreads个线程，所以无所谓超时终止，因此保活时间为0。
提交给线程池的线程会放到一个LinkedBlockingQueue的实例中。

除了可以设置多个线程，其他参数与上一个单线程的线程池非常相似。

2.3 newCachedThreadPool

/**
 * Creates a thread pool that creates new threads as needed, but
 * will reuse previously constructed threads when they are
 * available.  These pools will typically improve the performance
 * of programs that execute many short-lived asynchronous tasks.
 * Calls to {@code execute} will reuse previously constructed
 * threads if available. If no existing thread is available, a new
 * thread will be created and added to the pool. Threads that have
 * not been used for sixty seconds are terminated and removed from
 * the cache. Thus, a pool that remains idle for long enough will
 * not consume any resources. Note that pools with similar
 * properties but different details (for example, timeout parameters)
 * may be created using {@link ThreadPoolExecutor} constructors.
 *
 * @return the newly created thread pool
 */
public static ExecutorService newCachedThreadPool() {
    return new ThreadPoolExecutor(0, Integer.MAX_VALUE,
                                  60L, TimeUnit.SECONDS,
                                  new SynchronousQueue());
}

newCachedThreadPool具体设置参数为：

核心线程数为0，意味着如果线程池闲置后，不会保留任何线程。
最大线程数为最大整数，意味着如果需要，线程数量可以变得非常大。
缓存线程池的保活时间是60秒，线程不会立即被销毁，空闲60内如果有新任务，可以直接复用空闲线程。
提交给线程池的线程会放到一个SynchronousQueue的实例中。

3再深一层——从AbstractExecutorService到ThreadPoolExecutor

3.1 任务提交伊始——submit

外部使用线程池时，调用的是submit方法。该方法在接口ExecutorService中定义，在抽象类AbstractExecutorService中实现。

具体地，在抽象类AbstractExecutorService中，submit的实现为：

/**
 * @throws RejectedExecutionException {@inheritDoc}
 * @throws NullPointerException       {@inheritDoc}
 */
public Future submit(Runnable task) {
    if (task == null) throw new NullPointerException();
    RunnableFuture ftask = newTaskFor(task, null);
    execute(ftask);
    return ftask;
}

其中，FutureTask负责统一把输入的无论是Runnable还是Callable都统一转换为FutureTask实例，并以RunnableFuture接口的抽象形式返回。

随后，调用execute(ftask)方法来执行新提交的任务。该方法在抽象类的子类——ThreadPoolExecutor中具体实现。

3.2 开始执行任务——execute

任务提交后需要执行起来，submit中执行的方法execute(ftask)在抽象类的子类——ThreadPoolExecutor.execute中具体实现：

/**
 * Executes the given task sometime in the future.  The task
 * may execute in a new thread or in an existing pooled thread.
 *
 * If the task cannot be submitted for execution, either because this
 * executor has been shutdown or because its capacity has been reached,
 * the task is handled by the current {@link RejectedExecutionHandler}.
 *
 * @param command the task to execute
 * @throws RejectedExecutionException at discretion of
 *         {@code RejectedExecutionHandler}, if the task
 *         cannot be accepted for execution
 * @throws NullPointerException if {@code command} is null
 */
public void execute(Runnable command) {
    if (command == null)
        throw new NullPointerException();
    /*
     * Proceed in 3 steps:
     *
     * 1. If fewer than corePoolSize threads are running, try to
     * start a new thread with the given command as its first
     * task.  The call to addWorker atomically checks runState and
     * workerCount, and so prevents false alarms that would add
     * threads when it shouldn't, by returning false.
     *
     * 2. If a task can be successfully queued, then we still need
     * to double-check whether we should have added a thread
     * (because existing ones died since last checking) or that
     * the pool shut down since entry into this method. So we
     * recheck state and if necessary roll back the enqueuing if
     * stopped, or start a new thread if there are none.
     *
     * 3. If we cannot queue task, then we try to add a new
     * thread.  If it fails, we know we are shut down or saturated
     * and so reject the task.
     */
    int c = ctl.get();
    if (workerCountOf(c) < corePoolSize) {
        if (addWorker(command, true))
            return;
        c = ctl.get();
    }
    if (isRunning(c) && workQueue.offer(command)) {
        int recheck = ctl.get();
        if (! isRunning(recheck) && remove(command))
            reject(command);
        else if (workerCountOf(recheck) == 0)
            addWorker(null, false);
    }
    else if (!addWorker(command, false))
        reject(command);
}

可以看到，当工作线程较少，还不到核心线程数时，该方法会添加一个新线程，并把输入的Runnable command交给新线程执行。

如果已经达到核心线程数，该方法进行了一系列谨慎的检查工作，并且把输入的Runnable command加入到了workQueue中，具体地：

检查线程池控制字判断线程池是否在工作，如果是，就把任务加入工作队列workQueue.offer(command)：

如果任务能够顺利加入工作队列（true），那么会有工作线程去处理它。典型的工作队列LinkedBlockingQueue就属于这一类，这种情况下，线程数量不会超过corePoolSize的核心线程数。
- 当然，因为避免用重型锁，这里采用了CAS锁的形式，来避免加入任务后，线程池关闭、零线程的情况。
  - 如果第二次检查发现线程池不运行了，就移除刚刚加入的任务，并reject（reject方法会进一步调用RejectedExecutionHandler实例的handler.rejectedExecution(command, this);方法以便处理这种任务被线程池拒绝的情况）；
  - 如果线程池还在运行，但是没有工作线程，就新建一个线程来处理工作队列中新加入的任务。
如果任务不能顺利加入工作队列（false），那么就需要启动新的工作线程。典型的工作队列SynchronousQueue就属于这一类，如果没有线程阻塞在读取上，就无法插入新的任务，即会返回false。这么一来，就会启动新的线程，毕竟有阻塞在读取上的线程，才能加入新的任务。这意味着线程数量完全有可能超过corePoolSize规定的核心线程数。

3.3 工作队列——BlockingQueue

工作队列是一个阻塞队列，用于解决生产者-消费者问题。也就是说，execute扮演的是一个生产者的角色，它负责把检查过的任务加入到工作队列中，供线程池中的工作线程取出并执行。

在ThreadPoolExecutor中，工作队列workQueue是一个BlockingQueue：

/**
 * The queue used for holding tasks and handing off to worker
 * threads.  We do not require that workQueue.poll() returning
 * null necessarily means that workQueue.isEmpty(), so rely
 * solely on isEmpty to see if the queue is empty (which we must
 * do for example when deciding whether to transition from
 * SHUTDOWN to TIDYING).  This accommodates special-purpose
 * queues such as DelayQueues for which poll() is allowed to
 * return null even if it may later return non-null when delays
 * expire.
 */
private final BlockingQueue workQueue;

3.4 添加工作线程——addWorker

在上述execute方法中，要执行Runnable任务，需要线程池中有工作线程，是通过调用addWorker实现的。

具体地，线程池工作线程的创建和添加操作在ThreadPoolExecutor.addWorker方法中具体实现：

/**
 * Checks if a new worker can be added with respect to current
 * pool state and the given bound (either core or maximum). If so,
 * the worker count is adjusted accordingly, and, if possible, a
 * new worker is created and started, running firstTask as its
 * first task. This method returns false if the pool is stopped or
 * eligible to shut down. It also returns false if the thread
 * factory fails to create a thread when asked.  If the thread
 * creation fails, either due to the thread factory returning
 * null, or due to an exception (typically OutOfMemoryError in
 * Thread.start()), we roll back cleanly.
 *
 * @param firstTask the task the new thread should run first (or
 * null if none). Workers are created with an initial first task
 * (in method execute()) to bypass queuing when there are fewer
 * than corePoolSize threads (in which case we always start one),
 * or when the queue is full (in which case we must bypass queue).
 * Initially idle threads are usually created via
 * prestartCoreThread or to replace other dying workers.
 *
 * @param core if true use corePoolSize as bound, else
 * maximumPoolSize. (A boolean indicator is used here rather than a
 * value to ensure reads of fresh values after checking other pool
 * state).
 * @return true if successful
 */
private boolean addWorker(Runnable firstTask, boolean core) {
    retry:
    for (int c = ctl.get();;) {
        // Check if queue empty only if necessary.
        if (runStateAtLeast(c, SHUTDOWN)
            && (runStateAtLeast(c, STOP)
                || firstTask != null
                || workQueue.isEmpty()))
            return false;

        for (;;) {
            if (workerCountOf(c)
                >= ((core ? corePoolSize : maximumPoolSize) & COUNT_MASK))
                return false;
            if (compareAndIncrementWorkerCount(c))
                break retry;
            c = ctl.get();  // Re-read ctl
            if (runStateAtLeast(c, SHUTDOWN))
                continue retry;
            // else CAS failed due to workerCount change; retry inner loop
        }
    }

    boolean workerStarted = false;
    boolean workerAdded = false;
    Worker w = null;
    try {
        w = new Worker(firstTask);
        final Thread t = w.thread;
        if (t != null) {
            final ReentrantLock mainLock = this.mainLock;
            mainLock.lock();
            try {
                // Recheck while holding lock.
                // Back out on ThreadFactory failure or if
                // shut down before lock acquired.
                int c = ctl.get();

                if (isRunning(c) ||
                    (runStateLessThan(c, STOP) && firstTask == null)) {
                    if (t.getState() != Thread.State.NEW)
                        throw new IllegalThreadStateException();
                    workers.add(w);
                    workerAdded = true;
                    int s = workers.size();
                    if (s > largestPoolSize)
                        largestPoolSize = s;
                }
            } finally {
                mainLock.unlock();
            }
            if (workerAdded) {
                t.start();
                workerStarted = true;
            }
        }
    } finally {
        if (! workerStarted)
            addWorkerFailed(w);
    }
    return workerStarted;
}

其中，会根据输入参数boolean core来约束工作线程数的上限。如果core==true，则线程数不超过corePoolSize；否则线程数上限为maxPoolSize。

3.5 线程池工作线程——runWorker

线程池中保有的工作线程作为消费者的一方，要从工作队列中取出任务并执行。

具体地，线程池工作线程的主循环在ThreadPoolExecutor.runWorker方法中具体实现：

/**
 * Main worker run loop.  Repeatedly gets tasks from queue and
 * executes them, while coping with a number of issues:
 *
 * 1. We may start out with an initial task, in which case we
 * don't need to get the first one. Otherwise, as long as pool is
 * running, we get tasks from getTask. If it returns null then the
 * worker exits due to changed pool state or configuration
 * parameters.  Other exits result from exception throws in
 * external code, in which case completedAbruptly holds, which
 * usually leads processWorkerExit to replace this thread.
 *
 * 2. Before running any task, the lock is acquired to prevent
 * other pool interrupts while the task is executing, and then we
 * ensure that unless pool is stopping, this thread does not have
 * its interrupt set.
 *
 * 3. Each task run is preceded by a call to beforeExecute, which
 * might throw an exception, in which case we cause thread to die
 * (breaking loop with completedAbruptly true) without processing
 * the task.
 *
 * 4. Assuming beforeExecute completes normally, we run the task,
 * gathering any of its thrown exceptions to send to afterExecute.
 * We separately handle RuntimeException, Error (both of which the
 * specs guarantee that we trap) and arbitrary Throwables.
 * Because we cannot rethrow Throwables within Runnable.run, we
 * wrap them within Errors on the way out (to the thread's
 * UncaughtExceptionHandler).  Any thrown exception also
 * conservatively causes thread to die.
 *
 * 5. After task.run completes, we call afterExecute, which may
 * also throw an exception, which will also cause thread to
 * die. According to JLS Sec 14.20, this exception is the one that
 * will be in effect even if task.run throws.
 *
 * The net effect of the exception mechanics is that afterExecute
 * and the thread's UncaughtExceptionHandler have as accurate
 * information as we can provide about any problems encountered by
 * user code.
 *
 * @param w the worker
 */
final void runWorker(Worker w) {
    Thread wt = Thread.currentThread();
    Runnable task = w.firstTask;
    w.firstTask = null;
    w.unlock(); // allow interrupts
    boolean completedAbruptly = true;
    try {
        while (task != null || (task = getTask()) != null) {
            w.lock();
            // If pool is stopping, ensure thread is interrupted;
            // if not, ensure thread is not interrupted.  This
            // requires a recheck in second case to deal with
            // shutdownNow race while clearing interrupt
            if ((runStateAtLeast(ctl.get(), STOP) ||
                 (Thread.interrupted() &&
                  runStateAtLeast(ctl.get(), STOP))) &&
                !wt.isInterrupted())
                wt.interrupt();
            try {
                beforeExecute(wt, task);
                try {
                    task.run();
                    afterExecute(task, null);
                } catch (Throwable ex) {
                    afterExecute(task, ex);
                    throw ex;
                }
            } finally {
                task = null;
                w.completedTasks++;
                w.unlock();
            }
        }
        completedAbruptly = false;
    } finally {
        processWorkerExit(w, completedAbruptly);
    }
}

开头task初始值为firstTask，是线程池的成员变量，用于引用初始任务，通常为空。因此，线程池的主线程作为workQueue的消费者，通常情况下都是通过getTask()方法来取出任务。

取出任务后，谨慎地进行线程池的状态检查，并在运行任务的前后，分别调用beforeExecute和afterExecute方法。这两个方法在ThreadPoolExecutor中实现内容为空，也就是说不做任何事情。这两个方法是预留的，可以被继承实现，以增加额外的检查和功能（如：记录日志）。

运行任务显得很简单，线程池的工作线程执行Runnable任务实例的task,run()方法即可。

3.6 从工作队列中取任务——getTask

ThreadPoolExecutor.getTask()的实现：

/**
 * Performs blocking or timed wait for a task, depending on
 * current configuration settings, or returns null if this worker
 * must exit because of any of:
 * 1. There are more than maximumPoolSize workers (due to
 *    a call to setMaximumPoolSize).
 * 2. The pool is stopped.
 * 3. The pool is shutdown and the queue is empty.
 * 4. This worker timed out waiting for a task, and timed-out
 *    workers are subject to termination (that is,
 *    {@code allowCoreThreadTimeOut || workerCount > corePoolSize})
 *    both before and after the timed wait, and if the queue is
 *    non-empty, this worker is not the last thread in the pool.
 *
 * @return task, or null if the worker must exit, in which case
 *         workerCount is decremented
 */
private Runnable getTask() {
    boolean timedOut = false; // Did the last poll() time out?

    for (;;) {
        int c = ctl.get();

        // Check if queue empty only if necessary.
        if (runStateAtLeast(c, SHUTDOWN)
            && (runStateAtLeast(c, STOP) || workQueue.isEmpty())) {
            decrementWorkerCount();
            return null;
        }

        int wc = workerCountOf(c);

        // Are workers subject to culling?
        boolean timed = allowCoreThreadTimeOut || wc > corePoolSize;

        if ((wc > maximumPoolSize || (timed && timedOut))
            && (wc > 1 || workQueue.isEmpty())) {
            if (compareAndDecrementWorkerCount(c))
                return null;
            continue;
        }

        try {
            Runnable r = timed ?
                workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS) :
                workQueue.take();
            if (r != null)
                return r;
            timedOut = true;
        } catch (InterruptedException retry) {
            timedOut = false;
        }
    }
}

可以看到，根据timed变量，决定是限时等待型读取或是阻塞型读取。

如果timed==true，则在主循环中限时等待提取工作队列中的任务，即workQueue.poll方法：
- 如果工作队列中有任务，则立即返回；
- 如果没有，则等待指定时间，如果仍然没有，则返回null。
- （Retrieves and removes the head of this queue, waiting up to thespecified wait time if necessary for an element to becomeavailable.）
如果timed==false，则在主循环中阻塞式提取工作队列中任务，即workQueue.take方法：
- 如果工作队列中有任务，则立即返回；
- 如果没有，则线程阻塞，直到生产者加入任务后，有任务实例再返回。

4 J.U.C阻塞队列

上述“3.2开始执行任务——execute”中解释了LinkedBlockingQueue与SynchronousQueue为什么会分别用于不同的ThreadPoolExecutor。尤其是同步队列，CachedThreadPool依靠其插入失败就可以检测到没有数量匹配的读线程，由此增加线程池的线程数。

4.1无容量的同步队列——SynchronousQueue

这个容器就比较特别了，虽然名字是队列，但实际上没有任何容量。

A blocking queue in which each insert operation must wait for acorresponding remove operation by another thread, and vice versa. Asynchronous queue does not have any internal capacity, not even acapacity of one. You cannot peek at a synchronous queue because anelement is only present when you try to remove it; you cannot insert anelement (using any method) unless another thread is trying to remove it;you cannot iterate as there is nothing to iterate. The head of the queueis the element that the first queued inserting thread is trying to addto the queue; if there is no such queued thread then no element isavailable for removal and poll() will return null. For purposes of otherCollection methods (for example contains), a SynchronousQueue acts as anempty collection. This queue does not permit null elements.

使用该容器时，必须先取再插。也就是说，对于一个同步队列，如果没有任何线程在读取它，别的线程就无法对其插入新数据。通常，需要先启动一个线程读取同步队列，此时同步队列尚无数据，则该读线程会处于阻塞等待地状态。随后，启动一个线程向同步队列中插入数据，此时，阻塞等待数据的读线程会唤醒并读取插入数据。

因为其插入时必须要有读线程的特性，该容器被应用于检测读线程少于插入任务数量的情况，引导线程池增加新线程。

4.2链表阻塞队列——LinkedBlockingQueue

不难理解，这是一个基于链表实现的阻塞队列。

An optionally-bounded blocking queue based on linked nodes. Thisqueue orders elements FIFO (first-in-first-out). The head of the queueis that element that has been on the queue the longest time. The tail ofthe queue is that element that has been on the queue the shortest time.New elements are inserted at the tail of the queue, and the queueretrieval operations obtain elements at the head of the queue. Linkedqueues typically have higher throughput than array-based queues but lesspredictable performance in most concurrent applications.

既然是链表实现，那一般理解是可以无界的，当然也可以指定大小限定为有界。

链表阻塞队列采用FIFO模式，队列头是最早插入的，队列尾是最新插入的，取出时依据FIFO顺序。

基于链表阻塞队列在并发应用中吞吐量通常比基于数组的阻塞队列更大，因为基于链表的阻塞队列不至于同步锁定整个数组容器。基于链表的阻塞队列实际上在读写时，锁定入队和出队的位置就可以了。

4.3数组阻塞队列——ArrayBlockingQueue

不难理解，这是一个基于数组实现的阻塞队列。

A bounded blocking queue backed by an array. This queue orderselements FIFO (first-in-first-out). The head of the queue is thatelement that has been on the queue the longest time. The tail of thequeue is that element that has been on the queue the shortest time. Newelements are inserted at the tail of the queue, and the queue retrievaloperations obtain elements at the head of the queue. This is a classic"bounded buffer", in which a fixed-sized array holds elements insertedby producers and extracted by consumers. Once created, the capacitycannot be changed. Attempts to put an element into a full queue willresult in the operation blocking; attempts to take an element from anempty queue will similarly block.
This class supports an optional fairness policy for ordering waitingproducer and consumer threads. By default, this ordering is notguaranteed. However, a queue constructed with fairness set to truegrants threads access in FIFO order. Fairness generally decreasesthroughput but reduces variability and avoids starvation.

既然是基于数组实现的，那容器肯定是有界的，创建时就确定的，无法动态变化。这样一来，如果数组中存满了，再插入新的数据就需要阻塞至有元素被取走，同样地，如果数组中没有元素，读取操作需要阻塞至有元素被插入。

另外，数组阻塞队列还存在一个公平策略，如果严格要求保障FIFO的出入队列顺序，需要启用公平策略，这样可以避免饥饿问题（因为否则的话，可能有些元素长时间都不会被轮到取出来），但是也会减少吞吐量。