這個 Project 主要是用來熟悉 RabbitMQ、MongoDB、Docker 及 Kubernetes。 接下來我會用筆記的方式分享整個過程的點點滴滴(和一些我踩過的坑 😅)
綜觀全貌
Crawler 每分鐘固定抓 YouBike 的資料,切分成一個一個站點資訊以後,丟進 message queue (RabbitMQ) 裡。
Worker 得知 queue 裡有新東西時,就將它們拿出來一一處理,並存進一個 NoSQL database (MongoDB) 裡。
Client 端能透過呼叫 Express Server 的 API,抓取「離使用者最近的三個站點還剩下多少台 YouBike」的資訊。
整個流程在本地端測試完以後,先練習用 Docker 去建置整個環境。
OK 以後,再練習用 Kubernetes,或者更精確來說是 k3s + k3d,來建整個環境。
功能實作 此 Project 需要用到的資料集:YouBike臺北市公共自行車即時資訊JSON 、主要欄位說明 (要點開 資料資源下載網址
→ 檢視資料
才看得到)
建 Project 1 2 3 4 5 6 7 8 9 mkdir youbike-helper cd youbike-helpernpm init git init npm install express touch .gitignore touch app.js nodemon app.js
定期做事情
can print message every minute with node-cron
撈資料
can fetch online api json data with node-fetch
npm install node-fetch@2
(選用 v2 的原因 )
使用方法可見 node-fetch 的 Common Usage → JSON
Message Queue
can publish & consume object with amqplib
安裝 RabbitMQ:brew install rabbitmq
啟動:rabbitmq-server
npm install amqplib
整合 MongoDB
can insert data we want into MongoDB with mongoose
1 2 3 mkdir -p ~/mongos/db7 mongod --port 2777 --dbpath ~/mongos/db7
npm install dotenv --save
→ require('dotenv').config()
type 為 Date 的部分參考自這個
Code
model.js
:用來定 schema
修改 worker.js
讓它能新增 document
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 const { Info } = require ('./model' )const newInfo = { station : info['sna' ], version : 'v1' , available : info['sbi' ], location : { type : 'Point' , coordinates : [ Number (info['lat' ]).toFixed(4 ), Number (info['lng' ]).toFixed(4 ) ] }, datatime : new Date (`${info['mday' ].slice(0 ,4 )} -${info['mday' ].slice(4 ,6 )} -${info['mday' ].slice(6 ,8 )} T${info['mday' ].slice(8 ,10 )} :${info['mday' ].slice(10 ,12 )} :${info['mday' ].slice(12 ,14 )} ` ) }; const theInfo = await Info.create(newInfo);console .log(theInfo);console .log('==============================' );
秀出最接近的幾個站點
can print nearest stations regarding a point with aggregate and $geoNear
🎉 主要是參考這篇 「Or using aggregate instead」做到的 🎉
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Info.aggregate( [ { '$geoNear' : { 'near' : { 'type' : 'Point' , 'coordinates' : [ 121.5312 , 25.0299 ] }, 'spherical' : true , 'distanceField' : 'distance' }, }, { '$skip' : 0 }, { '$limit' : 3 } ], (err, nearestInfos ) => { if (err) throw err; for (const info of nearestInfos) { console .log(info['station' ]); console .log(info['available' ]); console .log('==============================' ); } } );
新版 model.js
,為了能做 $geoNear
(參考這篇 修改的)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 const mongoose = require ('mongoose' );const { Schema } = mongoose;require ('dotenv' ).config();const URI = process.env.MONGODB_URI;mongoose.connect(`${URI} /demo` ); const db = mongoose.connection;db.on('error' , console .error.bind(console , 'MongoDB connection error:' )); const infoSchema = new Schema({ station : { type : String , required : true , }, version : { type : String , required : true , }, available : { type : Number , required : true , }, location : { type : { type : String , enum : ['Point' ], default : 'Point' , }, coordinates : { type : [Number ], } }, datatime : { type : Date , required : true , }, }, { strict : 'throw' }); infoSchema.index({ location : '2dsphere' }); const Info = mongoose.model('Info' , infoSchema);module .exports = { Info };
worker.js
經緯度次序也要對調,不然會噴錯!(這篇 有提到)
1 2 3 4 coordinates: [ Number (Number (info['lng' ]).toFixed(4 )), Number (Number (info['lat' ]).toFixed(4 )) ]
實際秀出來給使用者看
can show to users with ejs and geolocation
改成 upsert 而不是一直 insert
change to upsert to prevent infinite growth
1 2 3 4 5 const theInfo = await Info.findOneAndUpdate({ station : info['sna' ], version : 'v1' , }, newInfo, { upsert : true });
環境變數 對了,我的 .env
長這樣:
1 2 3 MONGODB_URI='mongodb://localhost:2777' AMQP_URI='amqp://localhost' SEARCH_API_URL='http://localhost:3000/search'
成果 在本地端測試:
1 2 3 rabbitmq-server mongod --port 2777 --dbpath ~/mongos/db7 nodemon app.js
Nice!
Docker 啟動多個 Docker container
can do the same job on docker containers with docker-compose up
先將 docker daemon 跑起來
寫 backend 的 Dockerfile:touch Dockerfile
1 2 3 4 5 6 7 8 9 FROM node:lts-alpineRUN mkdir -p /usr/src/app WORKDIR /usr/src/app COPY . /usr/src/app RUN npm install -g nodemon EXPOSE 3000 CMD [ "npm" , "start" ]
docker-compose.yml
:用來組合多個 container 成為一個完整的服務
也參考了「How to Set Up RabbitMQ With Docker Compose 」及「Docker Compose wait for container X before starting Y 」
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 version: "3.9" services: rabbitmq: image: rabbitmq:3-management-alpine container_name: 'rabbitmq' ports: - 5672 :5672 - 15672 :15672 volumes: - ~/.docker-conf/rabbitmq/data/:/var/lib/rabbitmq/ - ~/.docker-conf/rabbitmq/log/:/var/log/rabbitmq healthcheck: test: rabbitmq-diagnostics -q ping interval: 10s timeout: 5s retries: 5 start_period: 3s mongo: image: mongo ports: - "27017:27017" healthcheck: test: echo 'db.runCommand("ping").ok' | mongo localhost:27017/test --quiet interval: 2s timeout: 3s retries: 3 start_period: 3s backend: build: context: . ports: - "3000:3000" environment: - MONGODB_URI=mongodb://mongo:27017 - AMQP_URI=amqp://guest:guest@rabbitmq:5672 links: - mongo - rabbitmq depends_on: mongo: condition: service_healthy rabbitmq: condition: service_healthy
跑起來:docker-compose up
(後端有改 code 的話,先 docker-compose down
再 docker-compose up --build
,不然 code 不會更新到,這個地方花了我超多時間在 debug 😂)
Kubernetes
deploy to K3s with k3d
幾乎都參考自「Kubernetes Web App Deployment with k3d 」,那篇實在是太優質啦!
建立 Cluster
將 /mnt/data
加進 Docker → Preferences… → Resources → File Sharing
1 2 3 4 5 sudo mkdir -p /mnt/data k3d cluster create youbike-helper --servers 1 --agents 2 --port "30000-30005:30000-30005@server:0" --volume /mnt/data:/mnt/data kubectl config get-contexts kubectl cluster-info
建後端的 Docker Image
參考文 :Build the Docker images corresponding to the application components and pushing them to Docker Hub, so they become available to our deployment process.
透過 docker build
指令可依序讀取 Dockerfile 中的指令以自動建立 image。
Docker Hub 註冊一下吧~
1 2 3 4 5 6 7 docker build -t youbike-helper-backend . docker tag youbike-helper-backend kanido386/youbike-helper-backend docker push kanido386/youbike-helper-backend
建置 Database Persistent Volume
參考文 :A pod is ephemeral (短暫存在的), so its data is lost if the pod is destroyed. In this example, we are going to use a persistent volume (pv) to hold the application data so it persists beyond the lifetime of the pod.
pv.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 apiVersion: v1 kind: PersistentVolume metadata: name: youbike-helper-pv labels: type: local spec: storageClassName: manual capacity: storage: 5Gi accessModes: - ReadWriteOnce hostPath: path: "/mnt/data"
去 command line 跑一下:
1 2 kubectl apply -f pv.yaml kubectl get pv
Persistent Volume Claim
參考文 :persistent volume claim (pvc) are used by pods to request physical storage.
db-pv-claim.yaml
1 2 3 4 5 6 7 8 9 10 11 apiVersion: v1 kind: PersistentVolumeClaim metadata: name: youbike-helper-db-pv-claim spec: storageClassName: manual accessModes: - ReadWriteOnce resources: requests: storage: 3Gi
去 command line 跑一下:
1 2 kubectl apply -f db-pv-claim.yaml kubectl get pvc
想移除 PV 和 PVC 怎麼辦?
Deployment
參考文 :
A deployment defines the desired state for a component in our application.
In this case, we are declaring the desired state of our database and Kubernetes takes care of sticking to this definition.
db-deployment.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 apiVersion: apps/v1 kind: Deployment metadata: name: youbike-helper-db-deployment spec: selector: matchLabels: app: youbike-helper tier: db replicas: 1 strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 1 type: RollingUpdate template: metadata: labels: app: youbike-helper tier: db spec: containers: - name: youbike-helper-db image: mongo ports: - containerPort: 27017 volumeMounts: - mountPath: /mnt/data name: youbike-helper-db-volume volumes: - name: youbike-helper-db-volume persistentVolumeClaim: claimName: youbike-helper-db-pv-claim
參考文 :What is a rolling update strategy?
The rolling update strategy is a gradual process that allows you to update your Kubernetes system with only a minor effect on performance and no downtime .
In this strategy, the Deployment selects a Pod with the old programming, deactivates it, and creates an updated Pod to replace it.
去 command line 跑一下:
1 2 3 kubectl apply -f db-deployment.yaml kubectl get deployments kubectl get pods -o wide
Service
參考文 :
We need a service to make the database available to the backend.
This makes the database pod accessible only from within the cluster and the connection will target the port 27017 in the pod.
db-service.yaml
1 2 3 4 5 6 7 8 9 10 11 12 apiVersion: v1 kind: Service metadata: name: youbike-helper-db-service spec: selector: app: youbike-helper tier: db ports: - protocol: TCP port: 27017 targetPort: 27017
去 command line 跑一下:
1 2 kubectl apply -f db-service.yaml kubectl get services
各種 port 傻傻分不清楚?
(截圖自「Kubernetes Services explained | ClusterIP vs NodePort vs LoadBalancer vs Headless Service 」)
弄 RabbitMQ Deployment 和 Service 可寫在一起 rabbitmq.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 apiVersion: v1 kind: Service metadata: name: my-rabbitmq-service labels: app: rabbitmq spec: selector: app: rabbitmq type: NodePort ports: - name: rabbitmq port: 5672 targetPort: 5672 nodePort: 30001 - name: rabbitmq-management port: 15672 targetPort: 15672 nodePort: 30002 --- apiVersion: apps/v1 kind: Deployment metadata: name: my-rabbitmq-deployment labels: app: rabbitmq spec: replicas: 1 selector: matchLabels: app: rabbitmq template: metadata: labels: app: rabbitmq spec: containers: - name: rabbitmq image: rabbitmq:3.9.18-management-alpine ports: - containerPort: 5672 - containerPort: 15672 resources: limits: cpu: 500m memory: 512Mi requests: cpu: 500m memory: 512Mi livenessProbe: initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 5 successThreshold: 1 failureThreshold: 3 tcpSocket: port: 5672 readinessProbe: initialDelaySeconds: 10 periodSeconds: 10 timeoutSeconds: 5 successThreshold: 1 failureThreshold: 3 tcpSocket: port: 5672
去 command line 跑一下:
1 2 3 4 kubectl apply -f rabbitmq.yaml kubectl get deployment kubectl get service
處理 Backend 環境變數
參考文 :3 ways to provide some environment variables when creating the deployment for the backend
defining a variable value directly in the deployment definition
get a variable’s value from a separate configuration object called ConfigMap
get a variable’s value from a separate configuration object called Secret
backend-config.yaml
1 2 3 4 5 6 7 apiVersion: v1 kind: ConfigMap metadata: name: backend-config data: MONGODB_URI: youbike-helper-db-service AMQP_URI: my-rabbitmq-service
參考文 :
Please note, we provide the name of the service created before.
That’s because Kubernetes handles the name resolution so the requests will reach the database pod without problems.
去 command line 跑一下:
1 2 3 kubectl apply -f backend-config.yaml kubectl get configmap kubectl describe configmap backend-config
(因為 project 不需要,所以就沒設定 secret 了,設定方式詳見參考文「Next, let’s create a Secret.」)
Deployment backend-deployment.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 apiVersion: apps/v1 kind: Deployment metadata: name: youbike-helper-backend-deployment spec: selector: matchLabels: app: youbike-helper tier: backend replicas: 3 template: metadata: labels: app: youbike-helper tier: backend spec: containers: - name: youbike-helper-backend image: kanido386/youbike-helper-backend:latest env: - name: MONGODB valueFrom: configMapKeyRef: name: backend-config key: MONGODB_URI - name: MONGODB_URI value: mongodb://$(MONGODB) - name: AMQP valueFrom: configMapKeyRef: name: backend-config key: AMQP_URI - name: AMQP_URI value: amqp://$(AMQP) resources: requests: cpu: "500m" memory: "512Mi" limits: cpu: "1000m" memory: "1024Mi" ports: - containerPort: 3000
去 command line 跑一下:
1 2 3 4 5 6 kubectl apply -f backend-deployment.yaml kubectl get deployments kubectl exec -it [pod] /bin/ash
Service
參考文 :
If we need to connect to the backend pods directly, how could we find out and keep track of the IP addresses to connect to?
Thanks to this abstraction, we don’t need to worry about any of this.
The service “knows” where to find all the pods that matches its selector and any request that reaches the service will reach one of those pods.
Even if a pod dies or even a node crashes, Kubernetes will make sure to keep the desired number of pods and the service will allow us to reach the required pods.
backend-service.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 apiVersion: v1 kind: Service metadata: name: youbike-helper-backend-service spec: selector: app: youbike-helper tier: backend type: NodePort ports: - name: youbike-helper-backend port: 3000 targetPort: 3000 nodePort: 30000
去 command line 跑一下:
1 2 kubectl apply -f backend-service.yaml kubectl get services
試試 Ingress
參考文 :We can access the application from outside and make sure the requests will reach the correct destination.
ingress.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: youbike-helper-ingress annotations: ingress.kubernetes.io/ssl-redirect: "false" spec: rules: - http: paths: - path: / pathType: Prefix backend: service: name: youbike-helper-backend-service port: number: 3000
參考文 :The ingress controller will start acting as a HTTP reverse proxy.
去 command line 跑一下:
1 2 kubectl apply -f ingress.yaml kubectl get ingress
(但後來我沒靠 Ingress 連,而是靠 NodePort 😂) (我靠 NodePort 連的這個做法其實挺不安全的!)
進行測試
參考文 :You can play with the application and then check the logs:(也可以用這個 debug!)
1 2 3 4 kubectl get pods kubectl logs [pod]
用 http://localhost:30000
成功連進頁面 🎉 但當我按「查找附近的 YouBike 站點!」居然沒反應⋯⋯ 啊!原來是我 call 後端 API 的網址忘記改啦⋯⋯ (應該要把 http://localhost:3000/search
改成 http://localhost:30000/search
,當然或許有更好的改法 🤔)
解掉問題 要改 code 的話,會有以下程序要做:
那部分改為環境變數
.env
for 本地端測試用 (假裝自己沒架 k3d),得加上
1 SEARCH_API_URL='http://localhost:3000/search'
app.js
html 裡面好像不能吃到環境變數,得靠後端傳入
1 2 3 4 5 app.get('/' , (req, res ) => { res.render('index' , { SEARCH_API_URL : process.env.SEARCH_API_URL }); });
index.ejs
參考自「Accessing EJS variable in Javascript logic 」,在 <script>
裡面加上:
1 2 3 4 const response = await axios.post('<%- SEARCH_API_URL %>' , { longitude : position.coords.longitude, latitude : position.coords.latitude });
更新 docker image 並 push 上去
1 2 3 4 5 6 7 docker build -t youbike-helper-backend . docker tag youbike-helper-backend kanido386/youbike-helper-backend docker push kanido386/youbike-helper-backend
修改 backend-config.yaml
及 backend-deployment.yaml
1 2 3 4 5 6 7 8 9 SEARCH_API_URL: 'http://localhost:30000/search' - name: SEARCH_API_URL valueFrom: configMapKeyRef: name: backend-config key: SEARCH_API_URL
重新 apply 上面那兩個
1 2 kubectl delete -f backend-config.yaml && kubectl apply -f backend-config.yaml kubectl delete -f backend-deployment.yaml && kubectl apply -f backend-deployment.yaml
再跑跑看 http://localhost:30000/
然後點「查找附近的 YouBike 站點!」,成功啦 😎
Kubernetes Dashboard 參考文有放「Kubernetes Dashboard 」的操作教學,這邊就不贅述了~
希望讀完這篇文章的您能夠有所收穫,我們下篇文見啦 😃