NLP-AR与AE的原理

AR原理

AR是自回归的模型(AutoRegressive LM),是一种使用上下文词来预测下一个词的模型。但是在这里,上下文单词被限制在两个方向,前向或后向。

AR的代表有:

  1. 传统的语言模型,根据上文预测下一个词。
  2. ELMo扩展了语言模型,增加了双向词的预测,上文预测下一个词和下文预测上一个词,但是本质上还是AR的原理。
  3. 再到GPT是把AR发挥到极致的做法,在AR的基础上,提升预料的质量,加大训练的资源,最终训练出相当不错的效果。

AR的优点和缺点:

  • 缺点是只能利用上文或者下文的信息,不能同时利用上文和下文的信息。当然,貌似ELMO这种双向都做,然后拼接看上去能够解决这个问题,因为融合模式过于简单,所以效果其实并不是太好。
  • 优点是符合下游NLP任务语言环境,比如生成类NLP任务,比如文本摘要,机器翻译等,在实际生成内容的时候,就是从左向右的,自回归语言模型天然匹配这个过程。而Bert这种DAE模式,在生成类NLP任务中,就面临训练过程和应用过程不一致的问题,导致生成类的NLP任务到目前为止都做不太好。

AE原理

AE是自编码语言模型(AutoEncoder LM),它能比较自然地融入双向语言模型,同时看到被预测单词的上文和下文。

  1. Bert通过在输入X中随机Mask掉一部分单词,然后预训练过程的主要任务之一是根据上下文单词来预测这些被Mask掉的单词,如果你对Denoising Autoencoder比较熟悉的话,会看出,这确实是典型的DAE的思路。那些被Mask掉的单词就是在输入侧加入的所谓噪音。类似Bert这种预训练模式,被称为DAE LM。

AE的优点和缺点:

  • 优点是能比较自然地融入双向语言模型,同时看到被预测单词的上文和下文
  • 缺点是在训练的输入端引入[Mask]标记,导致预训练阶段和Fine-tuning阶段不一致的问题。

Ubunt18.04安装Tensorflow Docker

准备

查看ubuntu系统是32位的还是64位的:

getconf LONG_BIT

查看系统信息:

lsb_release -a

查看操作系统架构:uname -a

卸载旧版本

sudo apt-get remove docker docker-engine docker.io containerd runc

/var/lib/docker的内容,包括镜像、容器、卷和网络,可以保留也可以删除。

执行之后,如果输入docker –version仍能看到docker版本,采用另一种方式:

sudo apt-get purge docker
sudo apt-get purge docker-ce
sudo apt-get remove -y docker-*

注意:(apt-get remove 会删除软件包而保留软件的配置文件
apt-get purge 会同时清除软件包和软件的配置文件)

Install using the repository

1)sudo apt-get update

2)允许apt通过https使用repository安装软件包 

sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common

 3)添加Docker官方GPG key

sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

(国内阿里云版 sudo curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | apt-key add -

4)验证key的指纹

  sudo apt-key fingerprint 0EBFCD88

正常输出为:

pub   rsa4096 2017-02-22 [SCEA]
      9DC8 5822 9FC7 DD38 854A  E2D8 8D81 803C 0EBF CD88
uid           [ unknown] Docker Release (CE deb) <docker@docker.com>
sub   rsa4096 2017-02-22 [S]

5)添加稳定版repository

sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

国内阿里云版:

sudo add-apt-repository \
   "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

5)sudo apt-get update

6)安装最新版本的docker ce和containerd

sudo apt-get install docker-ce docker-ce-cli containerd.io

(如果您启用了多个Docker存储库,则在apt-get install或apt-get update命令中未指定版本的情况下安装或更新将始终安装尽可能高的版本)

7)安装指定版本的

查看可获取的版本 apt-cache madison docker-ce

sudo apt-get install docker-ce=<VERSION_STRING> docker-ce-cli=<VERSION_STRING> containerd.io

 8)验证:docker –version

sudo docker run hello-world

 9)将非root用户加入docker组,以允许免sudo执行docker

sudo gpasswd -a 用户名 docker

 重启服务并刷新docker组成员

sudo service docker restart
newgrp - docker

10)设置开机自启动并启动 Docker-ce(安装成功后默认已设置并启动,可忽略)

sudo systemctl enable docker
sudo systemctl start docker

11)升级版本

a) sudo apt-get update

b) 按照以上步骤安装新版本

12)安装docker-compose

https://www.runoob.com/docker/docker-compose.html

sudo curl -L https://github.com/docker/compose/releases/download/1.25.4/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose

sudo chmod +x /usr/local/bin/docker-compose

sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose

docker-compose –version

配置

Configure containerd with a default config.toml configuration file:

sudo mkdir -p /etc/containerd \
    && sudo containerd config default | sudo tee /etc/containerd/config.toml

For using the NVIDIA runtime, additional configuration is required. The following options should be added to configure nvidia as a runtime and use systemd as the cgroup driver. A patch is provided below:

cat <<EOF > containerd-config.patch
--- config.toml.orig    2020-12-18 18:21:41.884984894 +0000
+++ /etc/containerd/config.toml 2020-12-18 18:23:38.137796223 +0000
@@ -94,6 +94,15 @@
        privileged_without_host_devices = false
        base_runtime_spec = ""
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
+            SystemdCgroup = true
+       [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
+          privileged_without_host_devices = false
+          runtime_engine = ""
+          runtime_root = ""
+          runtime_type = "io.containerd.runc.v1"
+          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
+            BinaryName = "/usr/bin/nvidia-container-runtime"
+            SystemdCgroup = true
    [plugins."io.containerd.grpc.v1.cri".cni]
    bin_dir = "/opt/cni/bin"
    conf_dir = "/etc/cni/net.d"
EOF

After apply the configuration patch, restart containerd:

sudo systemctl restart containerd

You can test the installation by using the Docker hello-world container with the ctr tool:

sudo ctr image pull docker.io/library/hello-world:latest \
    && sudo ctr run --rm -t docker.io/library/hello-world:latest hello-world

Install NVIDIA Container Toolkit

First, setup the package repository and GPG key:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
    && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
    && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

Now, install the NVIDIA runtime:

sudo apt-get update \
    && sudo apt-get install -y nvidia-container-runtime

Then, we can test a GPU container:

sudo ctr image pull docker.io/nvidia/cuda:11.0-base
sudo ctr run --rm --gpus 0 -t docker.io/nvidia/cuda:11.0-base cuda-11.0-base nvidia-smi

You should see an output similar to the one shown below:

Tensorflow Docker

Download a TensorFlow Docker image

The official TensorFlow Docker images are located in the tensorflow/tensorflow Docker Hub repository. Image releases are tagged using the following format:

TagDescription
latestThe latest release of TensorFlow CPU binary image. Default.
nightlyNightly builds of the TensorFlow image. (Unstable.)
versionSpecify the version of the TensorFlow binary image, for example: 2.1.0
develNightly builds of a TensorFlow master development environment. Includes TensorFlow source code.
custom-opSpecial experimental image for developing TF custom ops. More info here.

Each base tag has variants that add or change functionality:

Tag VariantsDescription
tag-gpuThe specified tag release with GPU support. (See below)
tag-jupyterThe specified tag release with Jupyter (includes TensorFlow tutorial notebooks)

You can use multiple variants at once. For example, the following downloads TensorFlow release images to your machine:

docker pull tensorflow/tensorflow                     # latest stable release
docker pull tensorflow/tensorflow:devel-gpu           # nightly dev release w/ GPU support
docker pull tensorflow/tensorflow:latest-gpu-jupyter  # latest release w/ GPU support and Jupyter

Start a TensorFlow Docker container

docker run [-it] [--rm] [-p hostPort:containerPort] tensorflow/tensorflow[:tag] [command]

CPU-only images:

docker run -it --rm tensorflow/tensorflow \
   python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

 Start a bash shell session within a TensorFlow-configured container:

docker run -it tensorflow/tensorflow bash

To run a TensorFlow program developed on the host machine within a container, mount the host directory and change the container’s working directory (-v hostDir:containerDir -w workDir):

docker run -it --rm -v $PWD:/tmp -w /tmp tensorflow/tensorflow python ./script.py

GPU support

Check if a GPU is available:

lspci | grep -i nvidia

Install the Nvidia Container Toolkit –containerd

Step 0: Pre-Requisites

sudo modprobe overlay \
    && sudo modprobe br_netfilter

You can also ensure these are persistent:

cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF

Step 1: Install containerd

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker

After apply the configuration patch, restart containerd:

sudo systemctl restart containerd

You can test the installation by using the Docker hello-world container with the ctr tool:

sudo ctr image pull docker.io/library/hello-world:latest \
    && sudo ctr run --rm -t docker.io/library/hello-world:latest hello-world

Step 2: Install NVIDIA Container Toolkit

First, setup the package repository and GPG key:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
    && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
    && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

Now, install the NVIDIA runtime:

sudo apt-get update \
    && sudo apt-get install -y nvidia-container-runtime

Then, we can test a GPU container:

sudo ctr image pull docker.io/nvidia/cuda:11.0-base
sudo ctr run --rm --gpus 0 -t docker.io/nvidia/cuda:11.0-base cuda-11.0-base nvidia-smi

Verify your nvidia-docker installation:

docker run --gpus all --rm nvidia/cuda:11.0-base nvidia-smi

Examples using GPU-enabled images

Download and run a GPU-enabled TensorFlow image (may take a few minutes):

docker run --gpus all -it --rm tensorflow/tensorflow:latest-gpu \
   python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

Use the latest TensorFlow GPU image to start a bash shell session in the container:

docker run --gpus all -it tensorflow/tensorflow:latest-gpu bash

Reference

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker

https://www.cnblogs.com/walker-lin/p/11214127.html

https://www.tensorflow.org/install/docker

https://github.com/NVIDIA/nvidia-docker/blob/master/README.md#quickstart

Tensorflow Docker on Ubuntu20.04

Download a TensorFlow Docker image

The official TensorFlow Docker images are located in the tensorflow/tensorflow Docker Hub repository. Image releases are tagged using the following format:

TagDescription
latestThe latest release of TensorFlow CPU binary image. Default.
nightlyNightly builds of the TensorFlow image. (Unstable.)
versionSpecify the version of the TensorFlow binary image, for example: 2.1.0
develNightly builds of a TensorFlow master development environment. Includes TensorFlow source code.
custom-opSpecial experimental image for developing TF custom ops. More info here.

Each base tag has variants that add or change functionality:

Tag VariantsDescription
tag-gpuThe specified tag release with GPU support. (See below)
tag-jupyterThe specified tag release with Jupyter (includes TensorFlow tutorial notebooks)

Install

You can use multiple variants at once. For example, the following downloads TensorFlow release images to your machine:

docker pull tensorflow/tensorflow                     # latest stable release
docker pull tensorflow/tensorflow:devel-gpu           # nightly dev release w/ GPU support
docker pull tensorflow/tensorflow:latest-gpu-jupyter  # latest release w/ GPU support and Jupyter

Start a TensorFlow Docker container

To start a TensorFlow-configured container, use the following command form:

docker run [-it] [--rm] [-p hostPort:containerPort] tensorflow/tensorflow[:tag] [command]

Examples using CPU-only images

docker run -it --rm tensorflow/tensorflow \
   python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

Let’s demonstrate some more TensorFlow Docker recipes. Start a bash shell session within a TensorFlow-configured container:

docker run -it tensorflow/tensorflow bash

Within the container, you can start a python session and import TensorFlow.

To run a TensorFlow program developed on the host machine within a container, mount the host directory and change the container’s working directory (-v hostDir:containerDir -w workDir):

docker run -it --rm -v $PWD:/tmp -w /tmp tensorflow/tensorflow python ./script.py

GPU support

Docker is the easiest way to run TensorFlow on a GPU since the host machine only requires the NVIDIA® driver (the NVIDIA® CUDA® Toolkit is not required).

Install the Nvidia Container Toolkit to add NVIDIA® GPU support to Docker. nvidia-container-runtime is only available for Linux. See the nvidia-container-runtime platform support FAQ for details.

Check if a GPU is available:

lspci | grep -i nvidia

Verify your nvidia-docker installation:

docker run --gpus all --rm nvidia/cuda nvidia-smi

Note:nvidia-docker v2 uses --runtime=nvidia instead of --gpus allnvidia-docker v1 uses the nvidia-docker alias, rather than the --runtime=nvidia or --gpus all command line flags.

Examples using GPU-enabled images

Download and run a GPU-enabled TensorFlow image (may take a few minutes):

docker run --gpus all -it --rm tensorflow/tensorflow:latest-gpu \
   python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

Use the latest TensorFlow GPU image to start a bash shell session in the container:

docker run --gpus all -it tensorflow/tensorflow:latest-gpu bash

Reference

https://www.tensorflow.org/install/docker

Ubuntu20 Docker

Uninstall old versions

 sudo apt-get remove docker docker-engine docker.io containerd runc

Install using the repository

Set up the repository

  • Update the apt package index and install packages to allow apt to use a repository over HTTPS:
 sudo apt-get update
 sudo apt-get install \
    ca-certificates \
    curl \
    gnupg \
    lsb-release
  • Add Docker’s official GPG key:
 curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
  • Use the following command to set up the stable repository. To add the nightly or test repository, add the word nightly or test (or both) after the word stable in the commands below. Learn about nightly and test channels.
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Install Docker Engine

  • Update the apt package index, and install the latest version of Docker Engine and containerd, or go to the next step to install a specific version:
$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io
  • To install a specific version of Docker Engine, list the available versions in the repo, then select and install:

List the versions available in your repo:

apt-cache madison docker-ce
sudo apt-get install docker-ce=<VERSION_STRING> docker-ce-cli=<VERSION_STRING> containerd.io

Verify that Docker Engine is installed correctly by running the hello-world image.

 sudo docker run hello-world

Reference

https://docs.docker.com/engine/install/

.jfif是什么格式

一些网站下载图片后发现有些图片文件的后缀是.jfif的,而不是常见的.jpg或.png的,什么是.jfif文件?

.jfif是什么格式?

.jfif是一种图片存储格式,该格式直接使用JPEG标准但比普通JPEG包含较少的数据。它使JPEG比特流的交换数量的应用程序和平台之间。可以用任何图片浏览器或Web浏览器的帮助下进行查看。

PS:在搜索.jfif后缀的过程中还发现了一个比较有意思的网站,可以搜索各种文件后缀的含义。

https://www.filedesc.com/zh/

.jfif格式文件如何转成.jpg文件?

基本上能打开.jpg文件的软件都可以打开.jfif,如果没有图片转换软件直接改后缀名称一样有效。

最彻底的办法,直接把.jfif文件直接保存成.jpg文件

1、打开注册表编辑器(不会打开注册表的自行百度)

2、在注册表地址栏中输入

HKEY_CLASSES_ROOT\MIME\Database\Content Type\image/jpeg,然后将 “Extension” 的值改成.jpg

点击确认,之后在保存的都是.jpg文件了。