Image recognition, translation and speech synthesis - 3in1 Web API


For this tutorial, I will demonstrate how to use powerful Google API for making some useful applications. The tutorial is divided into two parts:

Part I   Building WebAPI service that handles image labeling and translation into different languages.
Part II  Consuming this API from Android application.

Using the code

We will start by creating new WebAPI project. Start Visual Studio choose New project ->  C# ->Web ->ASP.NET Web Application -Empty. Check WebAPI, and host in the cloud to be able to publish this project later.

packages.config will contain all the libraries we need for this project.

    <?xml version="1.0" encoding="utf-8"?>
  <package id="BouncyCastle" version="1.7.0" targetFramework="net45" />
  <package id="Google.Apis" version="1.19.0" targetFramework="net45" />
  <package id="Google.Apis.Auth" version="1.19.0" targetFramework="net45" />
  <package id="Google.Apis.Core" version="1.19.0" targetFramework="net45" />
  <package id="Google.Apis.Translate.v2" version="" targetFramework="net45" />
  <package id="Google.Apis.Vision.v1" version="" targetFramework="net45" />
  <package id="GoogleApi" version="2.0.13" targetFramework="net45" />
  <package id="log4net" version="2.0.3" targetFramework="net45" />
  <package id="Microsoft.AspNet.WebApi" version="5.2.3" targetFramework="net45" />
  <package id="Microsoft.AspNet.WebApi.Client" version="5.2.3" targetFramework="net45" />
  <package id="Microsoft.AspNet.WebApi.Core" version="5.2.3" targetFramework="net45" />
  <package id="Microsoft.AspNet.WebApi.WebHost" version="5.2.3" targetFramework="net45" />
  <package id="Microsoft.CodeDom.Providers.DotNetCompilerPlatform" version="1.0.0" targetFramework="net45" />
  <package id="Microsoft.Net.Compilers" version="1.0.0" targetFramework="net45" developmentDependency="true" />
  <package id="Newtonsoft.Json" version="7.0.1" targetFramework="net45" />
  <package id="Zlib.Portable.Signed" version="1.11.0" targetFramework="net45" />

Setting up API Keys

Since we will be using Google API we need to set up a google cloud vision api project first.

1. For Google Vision API  download  VisionAPI-xxxxxx.json file and save it in your project root directory
2. For Translation API get the API key from same page 

Back in the code, we will first invoke those API variables. Replace values with they keys acquired above.

 using System;
using System.Configuration;
using System.Diagnostics;
using System.IO;
using System.Web.Http;
namespace ThingTranslatorAPI2 {
  public class Global : System.Web.HttpApplication {
    public static String apiKey;
    protected void Application_Start() {
      apiKey = "API-KEY";
    private static void createEnvVar() {
      var GAC = Environment.GetEnvironmentVariable("GOOGLE_APPLICATION_CREDENTIALS");
        if (GAC == null) {
        var VisionApiKey = ConfigurationManager.AppSettings["VisionApiKey"]; 
        if (VisionApiKey != null) {
          var path = System.Web.Hosting.HostingEnvironment.MapPath("~/") + "YOUR-API-KEY.json";
          Trace.TraceError("path: " + path);
          File.WriteAllText(path,VisionApiKey );
          Environment.SetEnvironmentVariable("GOOGLE_APPLICATION_CREDENTIALS", path);


WebApiConfig located in Ap_Start folder will contain this. We tell the server to handle routes using attribute routing and not by the default router config.


using System.Web.Http;
namespace ThingTranslatorAPI2
    public static class WebApiConfig
    public static void Register(HttpConfiguration config)
            // Web API routes


API Controller

We need an API controller that will handle requests and process them. Request should contain image file and language code for the language we want the translation to be made. Images will be processed in memory so no need to save it on the disc.


using System;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Threading.Tasks;
using System.Web.Http;
using System.Web.Http.Results;
using GoogleApi;
using GoogleApi.Entities.Translate.Translate.Request;
using TranslationsResource = Google.Apis.Translate.v2.Data.TranslationsResource;
namespace ThingTranslatorAPI2.Controllers {
  public class TranslatorController : ApiController
    public async Task<jsonresult<response>> Upload() {
      if (!Request.Content.IsMimeMultipartContent())
        throw new HttpResponseException(HttpStatusCode.UnsupportedMediaType);
      String langCode = string.Empty;
      var response = new Response();
      byte[] buffer = null;
      var provider = new MultipartMemoryStreamProvider();
      await Request.Content.ReadAsMultipartAsync(provider);
      foreach (var content in provider.Contents)
        if (content.Headers.ContentType !=null && content.Headers.ContentType.MediaType.Contains("image"))
           buffer = await content.ReadAsByteArrayAsync();
          langCode = await content.ReadAsStringAsync();
      var labels = LabelDetectior.GetLabels(buffer);
      try {
        //Take the first label  that has the best match
        var bestMatch = labels[0].LabelAnnotations.FirstOrDefault()?.Description;
        String translateText;
        if (langCode == "en")
          translateText = bestMatch;
          translateText = TranslateText(bestMatch, "en", langCode);
        //original is our text in English
        response.Original = bestMatch;
        response.Translation = translateText;
      } catch (Exception ex) {
        response.Error = ex.Message;
        return Json(response);
      return Json(response);
   //Translate text from source to target language
    private String TranslateText(String text, String source, String target) {
      var _request = new TranslateRequest {
        Source = source,
        Target = target,
        Qs = new[] { text },
        Key = Global.apiKey
      try {
        var _result = GoogleTranslate.Translate.Query(_request);
        return _result.Data.Translations.First().TranslatedText;
      } catch (Exception ex) {
        return ex.Message;

For image labeling, we need this class

LabelDetect or.cs

    using Google.Apis.Auth.OAuth2;
using Google.Apis.Services;
using Google.Apis.Vision.v1;
using Google.Apis.Vision.v1.Data;
using System;
using System.Collections.Generic;
using System.Diagnostics;
namespace ThingTranslatorAPI2 {
  public class LabelDetectior {
   // Get labels from image in memory
    public static IList<AnnotateImageResponse> GetLabels(byte[] imageArray) {
        VisionService vision = CreateAuthorizedClient();
        // Convert image to Base64 encoded for JSON ASCII text based request   
        string imageContent = Convert.ToBase64String(imageArray);
        // Post label detection request to the Vision API
        var responses = vision.Images.Annotate(
            new BatchAnnotateImagesRequest() {
              Requests = new[] {
                    new AnnotateImageRequest() {
                        Features = new [] { new Feature() { Type = "LABEL_DETECTION"}},
                        Image = new Image() { Content = imageContent }
        return responses.Responses;
      catch (Exception ex)
      return null;
    // returns an authorized Cloud Vision client. 
    public static VisionService CreateAuthorizedClient() {
      try {
        GoogleCredential credential = GoogleCredential.GetApplicationDefaultAsync().Result;
        // Inject the Cloud Vision scopes
        if (credential.IsCreateScopedRequired) {
          credential = credential.CreateScoped(new[]
        return new VisionService(new BaseClientService.Initializer {
          HttpClientInitializer = credential,
          GZipEnabled = false
      } catch (Exception ex) {
        Trace.TraceError("CreateAuthorizedClient: " + ex.StackTrace);
      return null;

Response.cs will look like this

namespace ThingTranslatorAPI2.Controllers
  public class Response
    public string Original { get; set; }
    public string Translation { get; set; }
    public string Error { get; set; }

If you have any problem compiling the code check the source code attached here.

Now let's publish this to Azure Cloud. Go to Build - Publish and fill in all 4 input boxes to match your Azure settings.

Now we can use Postman to test it.



We received response that contains image label in English and translated version for the language we specified with langCode parameter.. 

Source code is available on GitHub

Points of Interest

There are few API's available today that can handle image labeling. One of them which I found very exotic is called CloudSight. Although it is more accurate then others it relies on human tagging. The downside of this is that it takes more time than a machine to do the job. Usually, reponse is received after 10-30 seconds. I can imagine if we run our app and the timeout happens. How could we call it connection timeout or maybe coffee break timeout ; -) ?

That's all in this tutorial.  In next article, we will build Thing Translator app that consumes this API.

Happy codding!