EMBEDDED

Sunday, 24 May 2020

Begin Machine Learning as a software engineer

In this post, i am writing about how to start applying machine learning (ML) in software program as a software engineer.

The picture below shows that in ML, we build ML model using data and results. This stands in contrast to traditional programming where we use software program to do the actual computation.

After the model is built, we use it to make predictions.

For sure, we can always create our own ML model. In order to do that, we have to get a rather good understanding of ML algorithms. If we are not able or not willing to create our own ML model (such as when we want to apply ML to a practical solution), we can re-use existing models in the ML libraries.

In machine learning (ML) programming, we choose the existing model, build the model architecture (such as a classifier), feed training data to the model, and use the trained model to make decisions on newly arrived data.

The difference between algorithm and model is:

This is the algorithm of linear regression with one variable 𝑦=𝑤0+𝑤1x

This is the model after applying data and results 𝑦=(5)+(-2)x

The purpose of training a model, is to adjust the model parameters so that the model fit well with the user supplied data.

We choose keras library as a starting point. There is Sequential model in the keras library. For this article, we look at the bank customer data and decides if the customer would stop using the bank’s services. This is a classification problem. We will use the Sequential model to do the classification.

In part one, we will do data preprocessing. Firstly, we import the numpy and pandas library. Pandas is a data manipulation library.

import numpy as np
import pandas as pd

Secondly, we import the dataset using pandas.

dataset = pd.read_csv(‘Churn_Modelling.csv’)
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values

Then, we encode the categorical data (one hot encoding).

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]

We split the dataset into the training set and test set.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

After that, we apply feature scaling to the data so that data is adjusted to a particular range.

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In part two, we will make an artificial neural network. Firstly, we import keras library.

import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout

Secondly, we use sequential model to build a ML classifier. Sequential model is a model with a sequence of layers; input, hidden, output layers.

classifier = Sequential()

We add the input layer and the first hidden layer.

classifier.add(Dense(units = 6, kernel_initializer = ‘uniform’, activation = ‘relu’, input_dim = 11))

We add the second hidden layer.

classifier.add(Dense(units = 6, kernel_initializer = ‘uniform’, activation = ‘relu’))

Lastly, we add the output layer.

classifier.add(Dense(units = 1, kernel_initializer = ‘uniform’, activation = ‘sigmoid’))

Now, compile the classifier. The adam optimizer is an adaptive moment estimator. N-th moment of random variable is the expected value of the variable to power of N. Optimizer decides how network weight will be updated.

classifier.compile(optimizer = ‘adam’, loss = ‘binary_crossentropy’, metrics = [‘accuracy’])

For binary classification, the binary_crossentropy loss function is suitable and should be used.

We fit the training set to the classifier.

classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)

In part three, we use the classifier to make predictions and evaluating the model. Firstly, we feed the test data to the classifier.

y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)

Secondly, we use the confusion matrix to evaluate the model.

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

When i examine the cm value, it is shown as:

It means out of 2000 samples, it makes correct predictions 1595 times.

The python and CSV files can be found at:

https://github.com/victoryeo/machinelearning

Sunday, 2 February 2020

Bug descriptions and solutions

This post is about bugs and their solutions on embedded Linux.

Bug 1
Remove internet connectivity on Gateway device. For the first time, plug in USB device with swupdate.swu, no firmware update is performed. For the second time, plug in USB device with swupdate.swu, firmware update occurs. When internet connectivity is removed, it should not do firmware update. So it is a bug.

Solution:
Check the gateway system time against x.509 certificate validity period. If system time is outside x.509 validity period, swupdate does not run. That is because swupdate software checks the x.509 certificate. The problem has nothing to do with internet connectivity.

# openssl x509 -in x509.crt -text -nount
validity: not before
not after
# date
-- show system time

Compare the system time and x509 certificate validity

Bug 2
When using mobile App to connect to Gateway device via BLE, multiple occurrence of DBUS error message "Rejected send message" is seen in journal log. This error message should not happen.

Solution:
Using dbus-monitor, we can see the dbus system message. There is method call from Bluez to config daemon, there is a signal and method return in response to the method call. The method return is rejected by the dbus daemon and dbus error message is printed out.

The gateway sends GATT Indicate to mobile app, mobile app responds with GATT Confirm. The Indicate is used to transfer data. Confirm is the acknowledgement. When mobile app sends Confirm to gateway, Bluez receives it, it sends dbus confirm to config daemon. And the next chunk of data is sent from config daemon to Bluez. The method return is also sent out by the config daemon. The config daemon uses dbus python library. The solution is in dbus python library. We have to check the dbus must_not_reply flag. If the flag is set, we stop the library from sending out method return.

Bug 3
If Gateway is physically connected to ethernet and has a wifi, and if ethernet link is not able to up, the gateway is not able to use wifi. The gateway should be able to auto failover to wifi.

Solution
Use network manager to check the link, if the ethernet link is not up, auto switch to wifi. The network manager config file provides checking of the link status.

Bug 4
When doing high throughput network application over ethernet , the CPU is loaded to over 90%.
This slows down the response time of other processes.

Solution:
The ethernet driver was using interrupt driven mode, switching to NAPI , and implement TSO so that segmentation of the packets is handled by NIC card. This way reduces the CPU load.

Thursday, 4 April 2019

Security Token Offering

Concept

Security tokens are cryptographic blockchain based tokens that represent financial assets such as bonds, notes, debentures, shares, options, and private equities; as well as tokenised real assets.

It allows fractional ownership of assets. It meets regulation scrutiny.

Companies use STO to raise money from investors. STO investors are promised gains in the form of dividends, rewards (interest rates), or increase in the value of the company.

STO standards

- ST-20, from Polymath, basically is ERC20 with whitelist investor
- ERC1400, draft standard, tranches of security (different filings of same
underlying security, Reg D for U.S. investors and Reg S for foreigners), ERC
20 compatible, incorporates ERC1410
- ERC1410, partially fungible token (organise tokens into set of partitions)
- ERC1404, draft standard, with transfer restrictions, ERC20 compatible
- R-Token, from Harbor, ERC20 with additional compliance checking

ERC1400
- Transfer of tokens can be reversed
- Token balance includes metadata - shareholder rights , other restrictions
- Token separated into tranches
- Standard UI to query transfer
- Standard event for redemption and issuance
- ST-20 results in ERC1400

ERC1404
- Maintain a whitelist of investor addresses
- Enforce complex restrictions
- Support branded standards, such as ST-20 and R-token

STO issuance platforms
- Polymath, with DAPP for STO token issuance, using Poly tokens
- Harbor, using R-tokens
- Securitize, using DS protocols, issues security tokens on XRP and Ethereum
- Swarm, using src20 protocol
- Securrency, using CAT-20 token, with KYC and AML engines, compatible
with any blockchain
- tZERO, using tZERO token

Polymath STO steps
- Register ticker symbol
- Deploy smart token contract
- Add investor to whitelist
- Mint tokens for shareholder
- Setup STO parameters - start date, end date, supply cap
- Starts STO - deploy STO contract
It has modular approach: STO module, Transfer Manager module, etc.
It uses smart contract method, such as verifyTransfer method in Transfer Manager module, to validate transfer.

Wednesday, 6 March 2019

Rails setup problems and solutions

After setting up rails, you could do 'rails -v', you get 'Rails 5.2.2'.

You copy or clone an existing rails project, cd to the project folder. You do 'rails -v', you get the error:
home/<username>/.rbenv/versions/2.5.3/lib/ruby/2.5.0/rubygems/core/ext/kernel_require.rb:59:in `require': cannot load such file -- bundler/setup (LoadError)

You run 'bundle install', you get the error:
home/<username>/.rbenv/versions/2.5.3/lib/ruby/2.5.0/rubygems.rb:289:in `find_spec_for_exe': can't find gem bundler (>= 0.a) with executable bundle (Gem::GemNotFoundException)

Then, to solve the problem, you run 'gem install bundler'.
You run 'bundle install' again, you still get the same error.

The solutions is:
Open Gemfile.lock in the project folder, check the BUNDLED_WITH . If it is 1.16.1, you need to do:
'gem install bundler -v 1.16.1'

Then, you can run 'bundle install'. It will be successful.

Similarly, you can run 'rails -v', it will show the version. The version is the one specified in Gemfile.lock

Friday, 19 October 2018

Elliptic Curve Cryptography

‌An elliptic curve is a set of points that satisfies a math equation:
‌y² = x³ + ax + b

The graph looks like

‌The graph has i‌nteresting properties

- ‌any point on curve can be reflected on x-axis , and remains on the curve

- ‌any non-vertical line can intersect at most 3 points on the curve

‌

Easy to go forward, hard to reverse -> property of trap-door function

An elliptic curve crypto-system can be defined by picking a prime number as a maximum, a curve equation and a public point on the curve.

A private key is a number N, and a public key is the public point dotted with itself N times. ( multiplied the public point N times)

Computing the private key from the public key - elliptic curve discrete logarithm function , eg. y = g ^ x mod q

Discrete logarithm function, hard to solve x (nobody knows x from y) (think of y as public key , x as private key)

It is a good trap-door function

It can obtain same level of security with smaller key size (compare to RSA)

For Bitcoin, secp256k1 is the parameters of the elliptic curve used in Bitcoin public key cryptography. The graph of secp256k1 elliptic curve:

secp256k1 details:

y² = x³+ax+b over P, is defined by T = (P,a,b,G,n,h) , where:

a = 0, b = 7 , so y² = x³+7

P = large prime number : 2²⁵⁶ - 2³² - 2⁹ - 2⁸ - 2⁷ - 2⁶ - 2⁴ - 1

G = 02 79BE667E F9DCBBAC 55A06295 CE870B07 029BFCDB 2DCE28D9 59F2815B 16F81798

h = 01

n = FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFE BAAEDCE6 AF48A03B BFD25E8C D0364141

Saturday, 29 September 2018

SHA Hash Algorithm

Important property:

One way from input to hash value, cannot reverse
Different input cannot generate the same hash value

Actual SHA example:

Choose a word to hash, eg CRYPTO
Convert the word to ASCII

CRYPTO becomes 67 82 89 80 84 79

Convert from ASCII to binary

01000011-01010010-01011001-01010000-01010100-01001111

(it becomes a 48 bit message)

Join and add 1 at the end

0100001101010010010110010101000001010100010011111

Add zeros to make message equal to 448 mod 512, a 48 bit message with the added one will need to have 399 zeros added to the end
Add original message length to the 64 bit field (which is the left over field after the 448 modular arithmetic), and let the message become 16 sections of 32 bits

01000011010100100101100101010000

01010100010011111000000000000000

00000000000000000000000000000000

00000000000000000000000000110000

Transform the 16 x 32 message into 80 words using step loop function, firstly, do ((14 XOR 9) XOR 3) XOR 1), we get

01000011010100100101100101010000

Rotate left, we get

10000110101001001011001010100000

Process is repeated until there are 80 words (one word = 32 bits)

The 1st, 3rd, 9th 14th words are chosen from the algorithm:

for i from 16 to 79

w[i] = (w[i-3] xor w[i-8] xor w[i-14] xor w[i-16]) leftrotate 1

Run a set of operations on the 80 words in specific order using the five variables

H0 - 01100111010001010010001100000001

H1 - 11101111110011011010101110001001

H2 - 10011000101110101101110011111110

H3 - 00010000001100100101010001110110

H4 – 11000011110100101110000111110000

The Operations are combination of AND, OR and NOT operators
The five variables are from first 32 bits of the fractional part of the square roots of the first 5 prime numbers
The result : get five variables

H0 – 01000100101010010111000100110011

H1- 01010000111001010011100001011000

H2-11110000010110000100011000111101

H3-01001011111101111111000111100101

H4-01000010110110011100101001001011

Convert the five variables to hex

H0- 44a97133

H1- 50e53858

H2- f058463d

H3 - 4bf7f1e5

H4 - 42d9ca4b

Join the variables together, get the hash

44a9713350e53858f058463d4bf7f1e542d9ca4b

Tuesday, 3 July 2018

Run Bitcoind as a Docker Service

This is a continuation from the previous blog article http://embedded-design-vic.blogspot.com/2018/06/run-bitcoind-in-docker-container.html

1) append this line to the Dockerfile,

so that bitcoind is started auto when container image is run

ENTRYPOINT ./bin/bitcoind -datadir=node -daemon && /bin/bash

2) rebuild the image

docker build -t bitcoin-docker .

3) tag image for upload to registry

docker tag <image> username/repository:tag

4) upload tagged image to registry

docker push username/repository:tag

5) add a docker-compose.yml file

version: "3"

services:

web:

# replace username/repo:tag with your name and image details

image: chaintope99/bitcoin:dev

deploy:

replicas: 5

resources:

limits:

cpus: "0.1"

memory: 50M

restart_policy:

condition: on-failure

ports:

- "5000:12001"

networks:

- webnet

networks:

webnet:

6) init the swarm, swarm is a group of machines and joined as cluster

swarm manager uses several strategies to run containers.

docker swarm init

7) run the specified docker compose file

docker stack deploy -c <composefile> <appname>

8) use curl to access the dockerised bitcoind service

curl --user username:password --data '{"method": "getinfo"}' http://127.0.0.1:5000

Monday, 25 June 2018

101 confirmations

Questions:

After mining, i do listtransactions command, i can see the address has amount of 50 coins in each transaction. Why getbalance command returns 0 ?

<pre>

$ src/bitcoin-cli -datadir=../datadir listtransactions

[

{

"account": "",

"address": "15454L2G44NuZoZrE2HdBwMmCchQVcGvKm",

"category": "immature",

"amount": 50.00000000,

"label": "",

"vout": 0,

"confirmations": 2,

"generated": true,

"blockhash": "00000000df3ae70ae6c5f3eb24b0dca0c37ef55b76fad5396f1386aaab2b0027",

"blockindex": 0,

"blocktime": 1527831787,

"txid": "24db59cbb12a8ffd3f1421931f2a6a2293b1b6437021af88119da95937c8f737",

"walletconflicts": [

"time": 1527831787,

"timereceived": 1527831828,

"bip125-replaceable": "no"

{

"account": "",

"address": "15454L2G44NuZoZrE2HdBwMmCchQVcGvKm",

"category": "immature",

"amount": 50.00000000,

"label": "",

"vout": 0,

"confirmations": 1,

"generated": true,

"blockhash": "00000000eb85f984adc2905671aaa8663d505c0ee71fb5a0d47996f76a12f336",

"blockindex": 0,

"blocktime": 1527831949,

"txid": "d47efd3725fb3be3072131ea6612b2e6581a876f11760e844e91ecd8b414f22e",

"walletconflicts": [

"time": 1527831949,

"timereceived": 1527831973,

"bip125-replaceable": "no"

}

]

$ src/bitcoin-cli -datadir=../datadir getbalance ""

0.00000000

</pre>

Generated coins cannot be spent until the generation transaction has gone through 101 confirmations. Transactions that try to spend generated coins before this will be rejected. The 101 confirmations is the maturity time.

The reason for this is that sometimes the block chain forks, blocks that were valid become invalid, and the mining reward in those blocks is lost. That is an unavoidable part of how Bitcoin works. If there was no maturation time, then whenever a fork happened, everyone who received coins that were generated on an unlucky fork (possibly through many intermediaries) would have their coins disappear, even without any sort of double-spend or other attack. On long forks, people could find coins disappearing from their wallets, even though there is no one actually attacking them and they had no reason to be suspicious of the money they were receiving.

For example, without a maturation time, a miner might deposit 50 BTC into an EWallet, and if the user withdraws money from a completely unrelated account on the same EWallet, the withdrawn money might just disappear if there is a fork and he/she is unlucky enough to withdraw coins that have been "tainted" by the miner's now-invalid coins.

Due to the way this sort of taint tends to "infect" transactions, far more than 50 BTC per block would be affected. Each invalidated block could cause transactions collectively worth hundreds of bitcoins to be reversed. The maturation time makes it impossible for anyone to lose coins by accident like this as long as a fork doesn't last longer than 100 blocks.

If a fork does last longer than 100 blocks, then the damage caused by invalidated transactions would likely be a huge disaster. This is unlike to happen, as something would have to be seriously wrong with Bitcoin or the Internet for a fork to last this long.

Friday, 22 June 2018

Run Bitcoind in Docker container

Use Docker container to run Bitcoin core

0) Install Docker

Docker can be installed on Windows, Linux, MacOS

1) Create new directory for running bitcoin core in Docker

mkdir bitcoin-core-docker

cd bitcoin-core-docker

2) Create new dir to store bitcoin.conf

mkdir node; cd node

3) Create bitcoin.conf and add the contents

server=1

regtest=1

port=12000

rpcport=12001

rpcallowip=0.0.0.0/0

rpcuser=username

rpcpassword=password

daemon=1

txindex=1

4) Build Docker container

cd .. ( to bitcoin-core-docker)

create Dockerfile

5) Add contents to Docker

# Dockerfile must start with a FROM instruction

# FROM instruction specifies the Base Image from which you are building

# FROM <image>[:<tag>]

FROM ubuntu:16.04

ENV BTCVERSION=0.15.1

ENV BTCPREFIX=/bitcoin-prefix

RUN apt-get update && apt-get install -y git build-essential wget pkg-config curl libtool autotools-dev automake libssl-dev libevent-dev bsdmainutils libboost-system-dev libboost-filesystem-dev libboost-chrono-dev libboost-program-options-dev libboost-test-dev libboost-thread-dev

WORKDIR /

RUN mkdir -p /berkeleydb

#download berkeley db

RUN git clone -b 0.15 --single-branch https://github.com/bitcoin/bitcoin.git

WORKDIR /berkeleydb

RUN wget http://download.oracle.com/berkeley-db/db-4.8.30.NC.tar.gz && tar -xvf db-4.8.30.NC.tar.gz && rm db-4.8.30.NC.tar.gz && mkdir -p db-4.8.30.NC/build_unix/build

ENV BDB_PREFIX=/berkeleydb/db-4.8.30.NC/build_unix/build

WORKDIR /berkeleydb/db-4.8.30.NC/build_unix

RUN ../dist/configure --disable-shared --enable-cxx --with-pic --prefix=$BDB_PREFIX

RUN make install

RUN apt-get update && apt-get install -y libminiupnpc-dev libzmq3-dev libprotobuf-dev protobuf-compiler libqrencode-dev

WORKDIR /bitcoin

RUN git checkout v${BTCVERSION} && mkdir -p /bitcoin/bitcoin-${BTCVERSION}

WORKDIR /bitcoin

RUN ./autogen.sh

RUN ./configure CPPFLAGS="-I${BDB_PREFIX}/include/ -O2" LDFLAGS="-L${BDB_PREFIX}/lib/ -static-libstdc++" --prefix=${BTCPREFIX}

RUN make

RUN make install DESTDIR=/bitcoin/bitcoin-${BTCVERSION}

RUN mv /bitcoin/bitcoin-${BTCVERSION}${BTCPREFIX} /bitcoin-${BTCVERSION} && strip /bitcoin-${BTCVERSION}/bin/* && rm -rf /bitcoin-${BTCVERSION}/lib/pkgconfig && find /bitcoin-${BTCVERSION} -name "lib*.la" -delete && find /bitcoin-${BTCVERSION} -name "lib*.a" -delete

WORKDIR /

RUN tar cvf bitcoin-${BTCVERSION}.tar bitcoin-${BTCVERSION}

# copy bitcoin,conf

ADD . /bitcoin-${BTCVERSION}

# expose rpc port for the node to allow access from outside container

EXPOSE 12001

WORKDIR /bitcoin-${BTCVERSION}

6)build the docker image

docker build -t bitcoin-docker .

The -t flag sets a name for the image.
The . tells Docker to looker for Dockerfile in the current directory

6.1)list the built images:

docker images

7) run the image in container

docker run -it -p 5000:12001 bitcoin-docker

-it is required for interactive processes (like a bash shell)
-p maps Ubuntu port 5000 to the container’s exposed port 12001, which is where Bitcoin rpc will be listening

if everything works, docker present a bash shell. Both the node directory and the bitcoin.conf file were copied to the container by the ADD instruction in the Dockerfile, so they will be present in the current working directory.

In the bash shell, run

7.1) bitcoind -datadir=node -daemon

7.2) bitcoin-cli -datadir=node getinfo

8) Connect to bitcoind from outside Docker (open second terminal window)

curl --user username:password --data '{"method": "getinfo"}' http://127.0.0.1:5000

9) In first terminal window, run this to stop bitcoind

bitcoin-cli -datadir=node stop

9.1) In second terminal window, exit the docker container

exit

Use Docker to run Bitcoin core as service

TBC…

PS: we can push the image to docker hub and share with others

# tag the image

docker tag bitcoin <username>/bitcoin:custom

# push to docker hub

docker push <username>/bitcoin:custom

# run the image, if image is not available locally, docker pull from repo

docker run -it -p 5000:12001 <username>/bitcoin:custom

Wednesday, 6 June 2018

Segregated Witness (Segwit)

Segwit

Introduction

Segregated Witness (Segwit) [1], proposed in BIP 141 [5], was activated on August 24, 2017. The contributions of Segwit [2]:

1) solve transaction malleability [3]

2) mitigate block size limitation problem

Problem

Transaction malleability:

When transaction is signed, the signature (script_sig) does not cover all the data in a transaction. Specifically, the script_sig is part of the transaction, the signature will not be able to sign script_sig. So the signature does not cover script_sig. The script_sig is added, after the transaction is created and signed.

The script_sig is the tempering point. If script_sig changes, TXID will change. The script_sig can be changed by anyone has access to the corresponding private keys.

2) Block size limitation problem

Originally, Bitcoin does not have limit on block size. This allowed attackers to create large size block data. So a 1MB block size was introduced. The 1MB was a tradeoff, between network propagation times, node capability, and number of transactions that can fit into one block, etc [4].

Proposal

Segwit defines a new structure called witness. Signature and redeem script are moved into this structure, which is not included in the 1MB block size limit.

1)Transaction structure

The conventional transaction structure is used in TXID calculation, and script_sig is empty. Even if script_sig is tempered with, TXID does not change.

2) Lock/Unlock script

For a conventional P2PKH:

scriptPubKey (lock script)

OP_DUP OP_HASH160 <pubkey hash> OP_EQUALVERIFY OP_CHECKSIG

scriptSig (unlock script)

For Segwit P2WPKH:

scriptPubKey (lock script)

0 <pubkey hash>

(unlock script)

scriptSig	Witness
Empty	<sig> <pubkey>

In scriptPubKey, there are no opcodes, only 2 data (version and hash) is pushed. When the lock script of this pattern is set, it is evaluated as a conventional P2PKH script. The signature and public key are obtained from witness instead of scriptSig.

3) Witness extension method

In the extension, Segwit introduces OP_CLTV (OP_NPO2) and OP_CSV (OP_NOP3)

The witness structure

For Segwit, witness version is 0, the witness program is P2WPKH if hash length is 20 bytes and P2WSH if it is 32 bytes.

4) Address format

Segwit uses Bech32 address format. It is based on BCH code instead of previously used Base58 encoding, so that error correction is possible [6]. There is no distinction between uppercase and lowercase letters. QR code is also compact

5) Increase of block size

The increase of block size from Segwit depends on the types of transaction.

Before Segwit

block data ≦ 1,000,000 MB

After Segwit

block weight = base size × ３ + total size
base size： Size of transaction data not including witness

total size： Size of transaction data including witness

block weight ≦ 4,000,000 MB

blocks are non-Segwit transactions, block size is 1MB, same as before

all transactions in the block are transactions of P2WPKH with 1 input, 2 output, block size is about 1.6 MB.
block has one output and all other transactions are P2WPKH input, it is huge Tx, the block size is about 2.1 MB.
block consists of transactions of P2WSH with huge witness (all 15-of-15 multisig etc), the block size is about 3.7 MB.

6) Changes in signature data

The convention message digest items are based on the conventional transaction structure. The message digest items are:

version, txin count, txins, txout count, txouts, locktime, sighash type

For Segwit, the message digest items are:

version
hashPrevouts	Hash of all input outpoint
hashSequence	Hash of all input sequence (TxIns)
outpoint	Previous output (32byte TXID + 4byte index) in TxIns
script code
value	amount of coins held by TxIns
sequence	Sequence of TxIns
hash output	Hash of all outputs (TxOuts)
locktime
sighash type

Segwit changes the calculation of transaction hash for signatures, so that each byte of a transaction is hashed twice, at most [7]. The sighash calculation cost is reduced.

7) Witness commitment in Coinbase transaction

For a conventional transaction, the merkle root calculation is shown as below. The merkle root is calculated using original Tx format.

Segwit adds the witness commitment. Merkle tree is constructed based on transaction data including signature data of witness. That merkle root is stored in one of coinbase transaction output to make commitment including the witness data.

Effects and Challenges

Segwit changes the consensus, P2P message, address format of Bitcoin protocol. It is amazing Segwit could be realised in soft fork.

Segwit introduces witness extension method. It cancels transaction malleability and increases block size. The actual block size increase depends on the transaction type.

References